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COMPARISON OF SUBJECTIVE AND OBJECTIVE JUDGMENTS 
OF CHILDREN’S DRAWINGS 
Betty LARK—Horovitz 


Educational Department, The Cleveland 
Museum of Art* 


The validity of subjective judgment in art 
has often been questioned. On the other 
hand, attempts to establish a basis for objec- 
tive judgments, by defining specific art values 
which may serve as criteria, have been 
attacked and laughed at. The reasons for 
dismissing objective judgment are generally 
that art is something personal, an emotional 
response; that it cannot be analyzed scien- 
tifically or approached by arguments of 
logic; or that so-called objective judgments 
cannot, in the end, be more valuable than 
subjective judgments. 

In order to proceed objectively we have 
defined certain qualities, through discussion 
and by means of continuous re-terming of 
concepts, in such a way that ambiguity and 


vagueness are eliminated as far as possible.’ 


Since aesthetic concepts are generally used 
in a vague manner which permits various in- 
terpretations, their compound meanings were 
broken up and reduced to the several factors 
which they contain. Each factor was again 
defined. Whenever a drawing was analyzed 
for the qualities it contained, the same defini- 
tions were used consistently in order to avoid 
personal-subjective interpretations of con- 
cepts. 

Our analysis of a drawing has not the 
direct aim of evaluating it as a drawing. It is 
rather in the nature of a minute and accurate 
description of all observable qualities of the 
drawing, which is followed by a classification 
of the findings. If desirable, a judgment can 
be arrived at from the material assembled 
without even consulting the drawing. 

Subjective judgment also proceeds by 
description and observation. The difference 

Ta, Sale ee aes oot. cater © cuat of he Bee. 


of New York to the Educational 
ry ey Museum of Art. The Author 


wi to thank Dr. Thomas Munro for reading this manu- 
script and for his suggestions. 


between it and objective judgment lies in the 
degree of accuracy reached by means of con- 
sistent use of definitions, enumeration of all 
available facts, and elimination of personal 
responses which are apt to obscure factual 
observations by the judge, and to which even 
persons who work constantly with objective 
methods will yield. 

In our analysis of children’s drawings** for 
the age levels 6-15, in which we checked all 
qualities which were perceptible in any of the 
actual drawings at every given age level, we 
arrived at age-level norms and found that 
certain art qualities appear, grow or decline 
between the ages of 6 and 15. Objective 
methods of checking qualities were used by 
applying consistently definitions which had 
been agreed upon. 

In order to compare the resulting objective 
judgments with subjective judgments made 
by persons who rank high in the artistic pro- 
fessions, we asked nine judges for their opin- 
ions of a group of 57 drawings made by 12- 
year-old boys and another group of 59 
drawings made by 12-year-old girls. All 
drawings had been part Of our objective 
analysis. 

The nine judges*** were: two university 
professors who specialize in aesthetics, two 
professional artists of repute (painters), two 
curators of museum collections, and three 
art educators (a public school art director 
and two supervisors). They all were asked 
the same question: “Kindly write on the 


** This analysis, carried out under a grant of the General 
Education Board of New York to the Educational Depart- 
ment of The Cleveland Museum of Art, was partly used for, 
and f the basis of, an estimation of children’s drawi 


*** The Author wishes to express the gratitude of the de- 
partment to the nine judges who made this experiment 
possible by their helpful assistance and cooperation. 
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blank to the left of each drawing whether 
you would classify the drawing as belonging 
to the ‘best’ or ‘worst’ of each of these groups. 
Explain for either classification the basis of 
your judgment in terms of presence or absence 
of art (drawing) qualities.” 

The results showed, first of all, that the 
consensus of opinion of the nine judges was 
limited to a very few cases. Graphs I and II 
give the choices of “best” and “worst” draw- 
ings of boys and girls by the nine judges. 

The choices of boys’ best drawings have a 
majority of judges’ votes for only five draw- 
ings, though 18 in all were chosen. The worst 
drawings have a majority of votes for only 
three out of 16 drawings.* This shows how 
little agreement exists between the judges 
concerning best and worst; that is, the ex- 
tremes of the group. Moreover, when we 
compare the choices of judges of the same 
profession, we find that the two aestheticians 
agree in only two cases out of the 19 choices 
they both made; the two artists agreed in 
only two cases out of their 20 choices; the 
two curators agree on three choices out of 
21; and of the three art educators, two agree 
on 7 choices out of 17, and all three agree on 
three choices out of 29. Furthermore, there 
are two drawings, numbers 2066 and 3413, 
which were pronounced by one or two judges 
as best and by others as worst. Since 18 
atawings were pronounced best and 16 worst, 
only 23 remained as average. 


The graph of choices among the girls’ 
drawings (Graph II) shows a still greater 
scattering. Of the 59 drawings in the girls’ 
groups, 26 were chosen as best by the judges 
and 15 as worst. Thus the remaining 18 
average drawings appear almost as if they 
were exceptions. 

According to Graph II, no majority of 
opinion could be reached concerning best 
drawings and only three concerning worst 
drawings. Aestheticians, curators, artists, and 
educators differ in their opinions among 
themselves. It seems that more judges would 
have had to be consulted in order to reach a 
majority. 

The more interesting part of this study of 
variability in subjective judgments among 

* Both groups of drawings, those chosen as best and those 
chosen as worst, show that the children who made them 

the same mean I.Q., that is, 116. The range differs 
tly in that for the best drawings it is 103 to 140, and 
the worst, 115 to 117. Consequently in this case intelli- 
,-@- out as a factor which may have influ- 


indirectly, through some properties per- 
le in character the drawings. 
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judges who are art experts, is the analysis of 
the comments which explain their choices, 
and also the differences in observation of 
apparently similar drawing qualities of vari- 
ous drawings. 


All comments** contain one or more qual- 
ities which belong to either of the three 
following categories: Representative quali- 
ties, that is drawing qualities which mainly 
enhance the actual representation of the sub- 
ject, e.g. showing by placement what the 
situation is supposed to be or what the figures 
are doing, more or less correct drawing of the 
various objects or people; aesthetic qualities, 
i.e. qualities which appear somehow detached 
from representation and which manifest 
themselves in the arrangement and pattern- 
ization of colors, lines or masses; technical 
qualities, ie. drawing qualities which are due 
to the handling of the medium in its relation 
to the representation, as actually shown or 
suggested. 


Doubtless none of these three groups of 
qualities can exist by itself, or can be en- 
tirely separated for the purpose of analysis. 
Yet one of the three may dominate, and thus 
influence the judgment; or the judge may 
find one of the three (for instance aesthetic 
qualities) more of a determinant regarding a 
best or worst drawing, and be guided accord- 
ingly. For example, a correct drawing of a 
landscape, in which, however, color organiza- 
tion or balance of masses is neglected, may 
not be chosen because the judge is led to his 
choice by aesthetic qualities rather than 
representative ones. 

The following tables*** presents the quali- 
ties mentioned most often in the judges’ com- 
ments as bases for their choices. The percent- 
ages in the table are based on the number of 
elements or qualities mentioned by each 
judge. Thus, if judge III made 20 remarks 
concerning representation, 2 concerning tech-' 
nique and 15 concerning aesthetic qualities, 
he made a total of 37 remarks; therefore 
54% of his judgment was based on represen- 
tation, or the way the picture parts are 
drawn, 534% on technique, and 4014% of 
his judgment rested on aesthetic considera- 
tions. Since the content of the comments for 
best and worst, boys and girls, differs very 

** See sample of comments in Appendix I. 

=; ts ann Paeenee Samet aattla, ee 
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TABLE OF QUALITIES MENTIONED IN THE COMMENTS OF THE JUDGES 
AS BASES FOR THEIR CHOICES 
(In percentages) 
(The Roman numerals refer to the different judges) 


II Ill IV V VI Vil VIII 


Ix 
Representation 56 55 55 53 58 61 62 52 
Aesthetic Quality 34 36 39 38 34 36 37 40 
Technique 10 9 6 9 8 3 1 8 





little, all explanatory remarks were combined 
and treated together. 


The distribution of remarks concerning 
drawing or representative qualities, aesthetic 
means, and techniques, varies little among 
the nine judges. All judges, perhaps uncon- 
sciously, seem to think of representation as 
an essential in making a picture, of aesthetic 
means of expression as the next important 
factor, but consider technique of slight im- 
portance for children. 

Hence, there is no doubt that the judges 
agree on the importance of certain groups of 
qualities, and that on the presence of these 
qualities they base their judgment of a best 
drawing and on their absence that of a worst 
drawing. 

What is, then, the reason why they differ 


so strongly in their choices; why this differ- 
ence of opinion goes so far as to impel one 
judge to pronounce, for instance, drawing 
3413 best, while two other judges call it 
worst? 

Judge II, who considered number 3413 


best, says: “Hard to estimate. It is too 
simple, but there is something about the de- 
sign and the clarity of the boat and the boy 
that I like.” In other words, the design—an 
aesthetic quality—, clarity of the boat—rep- 
resentational quality—, were the reasons for 
judge II’s choice. Judge IV, pronouncing the 
same drawing worst, says: “Vague, discon- 
nected in movement, interest scattered.” In 
other words, vague—lack of representational 
quality, disconnected in movement—lack of 
aesthetic quality, interest scattered—lack of 
organizational, i.e. of aesthetic qualities, 
underlie his judgment. And judge VII says 
of the same picture: “Drawing lacks syn- 
thesis. Parts unrelated, and shows lack of 
imagination.” In other ‘words: lacks synthe- 
sis, parts unrelated—absence of organiza- 
tional, ie. aesthetic qualities; lack of imagi- 
nation—lack of representational and aesthetic 
qualities. 


Thus differences in standards are not the 
cause of the scattering of choices, but rather 
differences in their practical application. The 
standards, however, are vague, as can be seen 
from nearly all comments. To say of a draw- 
ing that it is “well composed”, or it has “no 
positive value”, or it is “below age level, yet 
somehow transcendent”, or “it is like a 
Chinese painting” explains little about the 
qualities in the drawing. “Well composed” is 
a general term which, unless broken up into 
more specific components, cannot be applied. 
The expression “no positive value” raises the 
question of what a positive value is, and in 
what way, after being defined, it applies to a 
drawing. Again, the remark “it is like a 
Chinese painting” implies that similarity to 
any Chinese painting, be it good or bad, as 
long as it is Chinese, heightens the value of 
the drawing. 


An objective analysis of a drawing does 
not directly make a choice between good or 
bad, best or worst. It states, however, in 
detailed and fairly definite terminology, the 
presence of any qualities in a drawing, and 
wherever necessary, its degree. From a chart 
thus obtained (Chart I) for age level 12, we 
find a number of quality checks for those of 
the boys’ and girls’ best drawings which were 
chosen by the nine judges. 

Such objective checks can be compared 
with the norms which were established with 
the help of the charted qualities of the chil- 
dren’s drawings of the age levels 6-15 and 
which, for age level 12, can be seen in 
Chart II. 


The best drawings of the boys, such as 
697, 1338, 1343, 1804 and 4034 which were 
chosen by a majority of judges, would have 
been marked superior also as a result of 
objective quality checks,.since they combine 
a number of qualities which, according to 
Chart II are all outstanding, “rare”, for this 
age level; in fact are still rare for age level 
15. Of the drawings which were chosen by 
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one, two or three judges, such as 1802, 3464, 
3537 and others, some show no outstanding 
quality whatsoever, some check on a few. 
Objective analysis would rank these drawings 
among average ones. Worst drawings would 
be classified as poor or average according to 
our objective analysis. 

There is one interesting question which is 
being answered unknowingly by the judges’ 
comments. All drawings marked best by a 
majority of judges are representationally 
true-to-appearance; some also show perspec- 
tive. A few of the drawings chosen as best by 
single judges are representationally schematic 
or mixed, that is, average for age-level 12, 
while some of these, according to objective 
analysis, are checked for not a single out- 
standing quality, others are checked for one 
or two of the higher qualities, such as blended 
tints, texture or moderately effective use of 
medium. But, and this is significant, not a 
single drawing was marked worst which was 
representationally advanced. The interpreta- 
tion of this fact, and the fact that all draw- 
ings marked by a majority as best are repre- 
sentationally accelerated, and that the 
amount of comments on representation alone 
is about 60%, can only point to the follow- 
ing: representation is an important and de- 
ciding factor in judging children’s drawings, 
and especially so when the drawing is not 
sufficiently supported by aesthetic qualities; 
on the other hand, a representationally aver- 
age drawing qualifies as superior only if it is 
enriched by a number of technical and 
aesthetic qualities. The conclusion is that, in 
an objective analysis, checks on representa- 
tional qualities must be weighted heavily 
because of their importance. 

Of the two drawings, 2066 and 3413, which 
were ranked both best and worst, No. 2066 
is checked on a single outstanding quality: 
intentionally indefinite form, which is a 
quality of sophistication, the value of which 
in the drawing depends on other qualities 
with which it may be linked. Number 3413, 
while representationally average, is checked 
for “subtle or delicate” line, which quality 
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miay have been a dominant determinant in 
the particular judge’s mind who marked it 
best, while the opinions of the two judges 
who classified it as worst were weighted by 
the absence of other qualities which they 
deemed more important. 


Thus, single judges will frequently consider 
as outstanding a drawing which, because one 
or two qualities appeal to them strongly 
enough to obliterate the lack of all other 
qualities, while objective analysis would 
mark it as average. If, however, all nine 
judges were of the opinion that the strength 
of the one outstanding quality carries the 
drawing, such judgment would be decisive. 
In fact, although norms can be established 
with the help of objective analysis, the im- 
portance of each rare or superior quality 
remains undetermined. In other words, one 
cannot add the checks on superior qualities 
and say: Johnny has five superior qualities, 
Bobby has seven, therefore Bobby has more 
ability than Johnny. Exactly the opposite 
may be true. Yet the importance of the in- 
tensity of a single quality, or that of the 
accumulation of several qualities, cannot be 
determined from the responses of one or two 
judges, since their responses differ. They do 
not differ, however, when a greater number 
of outstanding qualities show in one single 
drawing. 


Summarizing, we may say: unless a suffi- 
cient number of expert judges are consulted, 
subjective judgments, though based on sim- 
ilar aesthetic principles, differ to such an 
extent that they seem unreliable. Compared 
with them, judgments based on objective 
analysis appear more reliable. Since it is 
difficult and inconvenient to assemble nine 
or more art experts for purposes of deciding 
children’s art abilities, and since objective 
analysis could either be handled by a trained 
specialist for groups of schools or by a central 
office for a whole state, the establishment and 
standardization of norms for all branches of 
art activity and all age levels up to college 
age is recommended. 
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GraPH I 
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cation Boys’ best drawings cation Boys’ worst drawings 
no. of Roman numerals identify judges no. of Roman numerals identify judges 
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APPENDIX II 


DEFINITION OF HEADINGS OF 


CHARTS I AND II 


Representation 
Schema: 


Schematic drawings represent objects 
as they are believed to be, regardless 
of their actual, visual appearance. 
Size relationships are incorrect; for 
instance, the head of the figure may 
be three times as large as the body, 
or, the body and legs may be elon- 
gated many times their length while 
the head and arms remain small. All 
objects are drawn part by part, out- 
lining each, joining them in an addi- 
tive process, during which the parts 
are often connected at the wrong 
places. Top, side and front views are 
combined, objects and figures are 
often made transparent. These char- 
acteristics apply not only to single 
objects but also to compositions. 


Mixed Stage: 


It bears some of the characteristics 
of both the true-to-appearance and 
schematic stages combined. Interme- 
diate between the two. 


True-to-appearance: 


These drawings represent objects and 
human figures approximately as they 
appear when seen from one single 
viewpoint. Parts which are hidden in 
a particular view, though existent, 
are not represented. Size relation- 
ships are fairly correct. Parts are 
joined at the correct places. Though 
outlines overlap and thus indicate 
depth, modeling or representation of 
space through perspective may, or 
may not be represented. 


Perspective : 


Linear perspective is a representation 
of objects as if in three dimensional 
space by means of diminishing sizes, 
converging lines and foreshortening. 


One Plane:* 


The objects are all placed in one pic- 
ture plane, not overlapping, and as 
though they all were at equal dis- 
tances from the observer. 

There is no depth suggested, no per- 
spective, i.e. convergence and no 
diminution of sizes. 


Many Planes:* 


The use of many planes is a more 
advanced development. It necessarily 
implies the suggestion of distance 
through overlapping of objects, in- 
dicating that some objects are far- 
ther from the observer than others. 
Many planes may be indicated with 
or without linear perspective, i.e. 
convergence. 
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Planes showing distance :* 


The use of planes showing distance is 
an advanced a of repre- 
sentation. They usually only occur in 
landscapes, and the distance may be 
shown either through perspective, 
through multiplicity of planes with 
diminishing sizes, through variation 
of light and dark, i.e. paling of colors 
at the horizon or variation of hues. 


Motion successful: 


It is represented in the human figure 
or animals by flexibility of the body, 
bending of limbs, coordination of 
parts; in inanimate objects by lines 
symbolizing movement, such as cir- 
cular whirls around wheels, parallel 
lines to indicate air rushing past, or 
by blurred parts. 


Aesthetic Means 
Organized Grouping; Unusual: 


It is checked when three or more 
objects or figures are definitely organ- 
ized into a group, balanced and inter- 
related in regard to aesthetic effect 
by means of line-, color- or area- 
arrangement. 


Line: 
Bold:* 


Bold lines are those drawn with 
sufficient vigor to give a contrast 
with the surrounding areas and must 
show control of the medium as well 
as use for an aesthetic effect. 


Subtle-delicate :* 


Subtle lines show a controlled and 
varying gradation of width and thick- 
ness in order to emphasize parts of 
the drawing. Delicate lines show pre- 
cision and decisiveness though > 
plied delicately and faintly for the 
sake of an aesthetic effect. 


Decorative use: 


Decorative line is used either rhyth- 
mically or in repeats to achieve 
Ceo this technique an aesthetic 
effect. 


Area: 
yy 9 


hese areas are filled in erratically 


and negligently producing an uneven 
e 


effect, either because of technical in- 
ability or lack of interest or uncer- 


tainty about the desired effect. 
Graded :* 


In these areas, a hue or several hues 
are shaded from light to dark. 


Blended tints:* 


These are such that various hues are 


ming. led in one area. 
Bold: 


Bold areas are not merely thick in 
layer but suggest, through color and 
execution, daring contrasts or com- 
binations. 


* See footnote next page. 





March, 1942] 


Intentional Texture :* 
These areas show intentional varia- 
tions of the surface for the sake of 
representing some texture, e og. great, 
or for the sake of an aesthetically 
important effect in contrast to other 
areas. 


Color: 


Variation in local colors: 
Local colors are such colors as are 
characteristic or “true” for known 
objects, people or animals, e.g. grass 
is green, forests—even in the distance 
—are green, sky and water are blue, 
etc. Variations hetvedinee shadings of 
the true colors, e.g. several greens for 
trees or grass, several blues for sky 
or water. 

Distance :* 
Distance is indicated by light or 
greyed hues, e.g. purplish mountains. 

Outstanding :* 
Outstanding use of color is an unusu- 
ally effective use obtained either by 
contrasts, subtlety, or organization 
(symmetry, isolated color spots). 

Modeling :* 
This is represented by sufficient grad- 
ing with the help of the medium 
(pencil or —- to saguest solidity 
and shape. Z may include shading 
from dark to light, whether or not 
related to a definite source of light. 

Decorative Arrangement: 
Such an arrangement shows a con- 
scious distribution and emphasis of 
lines, colors or shapes either by using 
some modes of balance or by repeats 
or by subdivision into units which 
elaborate one or more themes for the 
sake of an aesthetic effect, e.g. 


symmetry. 


COMPARISON OF CHILDREN’S DRAWINGS 163 


Planned Asymmetry :* 
Intentional asymmetry is a purpose- 
ful grouping of objects in an faformal 
manner so as to attain a satisfactory 
feeling of stability without the use 
of a central axis. 


Technique 
Use of Medium: 


Consistent: 

A medium is used consistently when 
the tools and colors or ink are ap- 
plied in such a way that they follow 
the es functions of the tools. 
For example, a fine-point tool will 
not make broad strokes, or a coarse 
tool, fine lines. Thus, to be consisten 
the number of details, when rende 
with a blunt crayon point, must be 
reduced or the details rendered in 
large sizes. 


Moderately effective :* 
It is achieved when the effect is be- 
yond plain consistency. 


Unusually effective :* 
Such use is observable when the 
medium is handled with such skill 
that the effect obtained surpasses by 
far the usual expectations for this 
medium. 


Inconsistent :* 

A medium is used inconsistently 
when effects are attempted which are 
incompatible with it and which point 
to the use of a different medium, e.g. 
when fine lines, closely drawn, are 
attempted with a blunt crayon though 
they would best be drawn by pencil- 
point. 


*The content of this 5 ~ though re-worded, is 


Forms intentionally indefinite :* J - ition re-worded, is 

They are achieved by outlines pur- identical with the one Horovitz, Barn an 

F Sills, “Graphic Work-Sample ‘Diagnosis, An Analytic Method 

posefully blurred or changed in order 9 “festimating Children’s Drawing gibility”, The Cleveland 
to interrelate with the background. Museum of Art, Cleveland, Ohio, 1939. 





STEPS FOR THE APPLICATION OF THE JOHNSON-NEYMAN 
TECHNIQUE—A SAMPLE ANALYSIS 
RosBert H. KoENKER 
University of Minnesota 


Cart W. HANSEN 
Quincy, Ill., Public Schools 


In a recent investigation by one of the 
above authors, an attempt was made to dis- 
cover if certain underlying characteristics 
distinguish excellent achievers from poor 
achievers in two-figure division. A screening 
test in two-figure division was administered 
to 283 pupils in 6B classes in 9 elementary 
schools. After the 283 pupils were found to 
be homogeneous in division with respect to 
both variability and mean achievement, they 
were grouped as one distribution. The upper 
one-third, the 90 best achievers, were selected 
as excellent achievers, and the lower one- 
third, the 90 poorest achievers, were selected 
as poor achievers. The two groups were then 
measured on a number of factors thought to 
be associated with ability in two-figure divi- 
sion. The two groups were first compared on 
these factors by means of the “t”’ test for the 
significance of the difference between two 
means. This comparison showed that excel- 
lent achievers were significantly superior to 
poor achievers on all factors. 


Since excellent achievers might be superior 
to poor achievers on these factors because of 
such variables as mental age and chronolog- 
ical age, it seemed necessary to compare the 
two groups with the effects of these two vari- 
ables ruled out. This comparison could be 
made by means of a matching technique, but 
this approach is of limited value, because so 
many cases are often lost in matching that 
final groups are not representative of initial 
groups. To overcome this difficulty the 
Johnson—Neyman technique was used.* This 
technique consists essentially in testing the 
statistical significance of the best estimate 
of the difference in achievement between two 
groups, when the two groups are matched 
statistically on two basic characters. This 


* Palmer O. Johnson and J. Neyman, “Tests of Certain 

Linear Hypotheses and Their Application to Some Educa- 

lems,” Statistical Research Memoirs, I (June, 

1936), 72-93. Those interested in a discussion of the mathe- 

matical basis of the technique may refer to this original 
source. 
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technique eliminates the need of an 
individual-for-individual matching technique, 
and also makes it possible to determine at 
what level of significance the two groups 
differ, as well as to determine the limits of 
the regions wherein the differences between 
the two groups are significant. This technique 
developed by Johnson and Neyman is rapidly 
becoming recognized as one of great value to 
experimenters in education; it has been used 
to great advantage in many recent doctorate 
theses at the University of Minnesota and 
elsewhere. A sample analysis of the steps in- 
volved in this technique is given in the fol- 
lowing pages, for the benefit of those who 
may wish to apply it to the study of their 
problems. The basis of the analysis is an ex- 
planation of the technique as used in compar- 
ing 90 excellent achievers in division and 90 
poor achievers in division on ability in sub- 
traction,** when the effects of mental age 
and chronological age have been statistically 
controlled. 

Step ‘One. Compute the basic statistics for 
the dependent variable, subtraction, and the 
independent variables, mental age and chron- 
ological age. Basic computations involve 
means, standard deviations, and _ inter- 
correlations of the variables. Pertinent data 
are shown in Table I. The statistics for ex- 
cellent achievers are designated by the sub- 
script 1, the prime, and the letter z. The 
corresponding statistics for poor achievers 
are designated by the subscript 2, the double 
prime, and the letter #. 

Step Two. Translate the problem into the 
form of a linear hypothesis. The null hypoth- 
esis which is set up for this study is that in 
the population from which the sample is 
drawn, the difference between excellent and 
poor achievers with respect to achievement in 
subtraction is zero, when the effects of chrono- 


**In the original excellent and 
uta tn GaGa 


achievers were 
Johnson—Neyman 
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TABLE I 


STATISTICS COMPUTED FOR APPLICATION OF 
JOHNSON—NEYMAN ANALYSIS FOR 
SUBTRACTION TEST* 


Poor Achievers 


2 = 10.066667 
Y2 = 14.022222 
20.733333 

4.642317 

5.975836 

5.836285 
—.199112 
tu = .1038590 
n= Tyu = —.283369 
N= —_ Ns = 90 


* To simplify computations all the values 
used in this analysis are in terms of code 
scores. To obtain the code scores the midpoint 
of the lowest raw score interval was sub- 
tracted from each raw score, and the re- 
mainder was divided by the height of the 
interval. The raw scores, raw means, and raw 
standard deviations may be obtained from 


Excellent Achievers 
a = 15.122222 
y: = 10.833333 


z = 25.277T7T78 u= 
o.= 4484115 o”. 
y= 4.674398 rs 
o,= .954845 Fy 

9” xy = —.089145 an 
-283412 
.050203 


Tx: = 





Tein 2 =. 
— 9’ xy? 


F(x’, y’) —|i+% +-—— 


Ge 


their respective code values by use of the fol- 
lowing formulas: 


r”’. Tyu 
ane Xe) 





X= (x) 3+ 109, 
Y=y+ 123, 
=2+4, 


mental age 
chronological age 
subtraction score 





(.233412) — (—.089145) (.050203) 


APPLICATION OF JOHNSON-NEYMAN TECHNIQUE 


7, * — #,) +2 
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logical age and mental age have been 
removed. This hypothesis has been desig- 
nated as H (x’, y’).* The problem then 
reduces to the test of the hypothesis H (z’, 
y’) that no significant difference exists be- 
tween excellent and poor achievers in ability 
in subtraction, and the determination of a 
region of significance if the hypothesis H (x’, 
y’) is rejected at a specified level. 

Step Three. Obtain an estimate of the 
difference in subtraction between excellent 
and poor achievers for each set of values of 
the basic matching characters. To obtain 
this estimate, the linear regression of sub- 
traction upon the basic matching characters, 
mental age and chronological age, is found 
separately for excellent and poor achievers, 
and then the difference between these two 
regressions is found. This difference is 
labelled F(x’, y’) and the equation is as 
follows: 


> "x2  & a 
a eS > a -%) |- 
aye 


le ny Gy , 7 
piaxPata ty (y ~ 3, | 


Substituting values from Table I in the first 
half of the foregoing equation, the regression 
equation for the excellent achievers in sub- 


traction is obtained. Substituting the values, 
the equation is 


954845 





a(x’, y’) = 25.277778 + 


1 — (—.089145)* 
(.050203) — (—.089145) (.233412) 


4.484115 
954845 





(x’ —15.122222) + 
(y’ — 10.833333). 


A code score of 15 in mental age can be 
transformed to a raw score by a 
15 times 3 and adding 109. A code score of 1 
in chronological age can be transformed to a 
raw score by adding 10 and 123. A code score 








a(x’ yy’) = 24.347216 + 





of 25 in subtraction can be transformed to a 
raw score by adding 25 and 4. The coded 
standard deviations for chronological age and 
subtraction are the same as the raw standard 
deviations since the height of the interval is 
only one. However, in case of eee om 
the coded standard deviation value of 4.4841 

must be multiplied by 3, the height of the in- 
terval, to obtain the raw standard deviation. 


I — (—.089145)? 


4.674398 


Carrying through the computations, an esti- 
mate of the adjusted mean in subtraction for 
the excellent achievers is obtained. This 
value is 


.O51061x’ -+- .014622y”’. 


Substituting values from Table I in the 
second half of the equation for the value 
F(x’, y’), the regression equation for the poor 
achievers in subtraction is obtained. Substi- 


tuting the values, the equation is 

*The values x’ and y’ y’ represent some particular set of 
fixed values of the variables x and y. The use of the prime 
here has no designate the r’s and o’s 
of the eucelient 
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ed 





X Excellent 
e Poor 
C A Center of Accuracy 


Chronological age (Y) 
VA 
@ 








123 








T T pay iu 
on_Line of ponasignificance—_- 
° a 0 © 


Mental ‘tee (x) 
° 5 1 


Figure I.—Comparison of excellent achievers (x) and poor achievers (.) in subtraction when 
ee. is effected on chronological age, and on mental age as measured by The 
alifornia Test of Mental Maturity, Elementary Series, Grades 4-8. 


a, (.103590) — (—.199112) (—.283369) 5.836285 
u(x’ ,y’) = 20.733333 + 7 — (—.199113)° 4.643317 
(—.283369) — (—.199112) (.103590) 5.836285 
x’ — 10.06666 
( n+ 1 — (—.199112)* 5.975836 








(y’ — 14.022222). 





March, 1942) 


Carrying through the computations, an esti- 
mate of the adjusted mean in subtraction for 
the poor achievers is obtained. This value is 


APPLICATION OF JOHNSON-NEYMAN TECHNIQUE 
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This line is represented by oy on Figure I. 
The procedure in plotting this line will be 
explained later. Excellent achievers and poor 


u(x’ ,y’) = 23.858492 + .061747x’ — .26720IY’. 


The best estimate of the difference between 
adjusted means F(x’,y’) is obtained by sub- 
tracting u(x’,y’) from 2(x’,y’): 


achievers, whose coordinates in mental age 
and chronological age locate them upon this 
line of non-significance, may be expected to 


F(x’ ,y’) = (24.347216 + .o510612%’ + .014622y’) — (23.858492 + .0617472’ 


— .267201Y’); 


F(x’ ,y’) = .488724 — .010686x’ +- .281822y’. 


Since the effect of mental age and chrono- 
logical age on ability in subtraction is pre- 
dicted from the regression equation, the need 
for individual matching on the two characters 
is not necessary. The value of F(x’,y’) 
changes as the values of the basic characters 
change, and may be positive, negative, or 
zero. When all known values from a given 
set of data except those of x and y, which are 
variable, are substituted in the equation for 
F(x’,y’), am equation of the form F(x’,y’) 
=a- bx + cy is obtained. When F(x’,y’) 
=o, no difference between excellent and 
poor achievers exists with respect to ability 
in subtraction. The line represented by the 
equation a + bx + cy = 0 is thus called the 





show the same achievement in subtraction. 
On one side of the line of non-significance, 
the value of F is favorable to excellent 
achievers; on the other side, it is favorable 
to the poor achievers, but not significantly 
so unless the cases fall within a region of 
significance. 

Step Four. Obtain the absolute minimum 
of the sum of the squares. Compute the total 
variance multiplied by the degrees of free- 
dom, V, + N, — 6, where N, is the number 
of excellent achievers, V, is the number of 
poor achievers, and 6 is the number of sta- 
tistical constants employed in testing the 
hypothesis H(x’,y’). This estimate is desig- 
nated as S*, where 


IP) gy? — Pug? — Tyg? + 297 xy Van Pye 





S, = N, a,’ 


I —?P' xy” 


Rasen was" — Pxy* — Tye" + 27" xy Txu Tyu 





+N, 0? 





line of non-significance. In this example the 
equation for the line of non-significance is 


1—?r”,,y? 


F(x’ ,y’) = .488724 — .010686x’ + .281822y’ = o. 





Substituting the values from Table I, the 
equation becomes 


(—.089145)* — (.233412)* — (.050203)? 





S*, = 90(.954845)? — 


1 — (—.089145)? 


+ 2(+.089145) (.233412) (.050203) 





+ 90(5.836285)? — (—.199112)* — (.103590)* — (—.283369)? 





1 — (—.199112)? 


+ 2(—.199112) (.103590) (—.283369). 





S*, = 2889.504635. 
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Step Five. Compute the weighting factor 
(P+ Q). 


I I T(x’ —x,)? 
P+O—5-41 > I — Psy" L o’s? =e 





1 [(x’—zx,)? = 
> a 7” xy*| os? 





I 
+yt+ 


The value of (P+ Q) depends upon the 
values of the basic characters of matching the 
x’s and y’s. Where x’ and y’ lie near the pop- 
ulation means of x and y, the value of (P + 
Q) becomes small, since (P + @Q) becomes 
—— + —— when x, = * 
Ty es 
= y,. Substituting the values from Table I 
in the equation for (P + @Q), the equation 


— x, and y, = y’ 


I [ (x — 15.122222)? 
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and the standard deviations. When S*, (P + 
Q) is equated to any positive constant value, 





; (x’ =A) =) | 


, / 
Oox7y 


(y’ =p | 


HY =) 4 — ah. 


o'x0 y oy 





the equation of the ellipse Z is obtained; this . 
ellipse is variable in size, depending upon the 
value of the positive constant to which S”,) 
(P + Q) is equated. The center of the ellipse 
E is the point where (P + Q) is at a mini- 
mum. This point is called the center of accu- 
racy and is denoted by CA. This is the point 
at which the variance of the best linear esti- 
mate of the difference between the means of 
the two groups attains its minimum value 
with respect to subtraction. 


2(—.089145) 





— (—.089145)*|_ 


(x” — 15.122222) (y’ — 10.833333) 


(4.484115)? 





(4.484115) (4.674398) 


(y’ — 10.833333)? t 
(4.674398)? 


—_ 2 
[ (x’ — 10.066667) — 2(—.199112) 





1 f I 
¥ go \" "to (—.199112)*} 


(x’ — 10.066667) (y’ — 14.022222) 


(4.642317)? 


(y’ — 14.022222)? 





(4.642317) (5.975836) 


(5.975836)? 


P + QO = .co1094x"* + 2(.000131) x’y’ + .000837 y’”? + 2(—.105508)x” + 


2(—.100652)y’ + .366918. 


When all values except those of the vari- 
ables x’ and y’ are substituted from the data, 
a quadratic equation of the form 


S?, (P + Q) — A’x’? + 2B’x’y’ + C’y’? + 2D’x 


is obtained in which A’, B’, C’, D’, E’, and 
H’ are functions of the correlation coefficients 





Before the location of the center of accu- 
racy can be determined the value S?, (P + Q) 
for this particular example must be computed. 
, + 2E’y’ + H’ 


Substituting values from previous computa- 
tions. 


S*, (P + Q) = 2889.504635 [.cor0g4x”* + 2(.000131)x’y’ + .000837y’"* + 2(—.015508)x’ 


+ 2(—.011652)y’ + .366918]. 


S. (P + Q) = 3.160748x"* + 2(.377586)2’y’ + 2.4172999"* + 2(—44.810802) x’ 
+ 2(—33.669580) y’ + 1060.210256. 
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Therefore, 
A’= 3.160748, 
B’= _ 377586, 
C’== 2.417299, 
D’ = —44.810802, 
E’ = —33.669580, 
H’ = 1060.210256. 


The coordinates of the center of accuracy, 
%, and y,., may now be determined by use of 
the following equations: 

ill B’E’ —C’D’ B’D’ — A’E’ 
TY Cae ee 
where M = A’C’ — B’”” 


Hence, 
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n—S$ 
Fog’ 


n—sS 
Fo, 


V.o1.= » tea = 


where 

n,—=TI, 

n, = (n—s)—N,+N,—6—174. 
Referring to Snedecor’s F Tables* 

F ,., = 6.80, 

F os = 3-90. 
Therefore, 


——14_ — 25,588235, 


v.01 = "6 80 


174 
wo. = = 61538 
.05 3.90 44.0153565. 


M = (3.160748) (2.417299) — (.377586)’, 


M = 7.497902; 


_ (377586) (—33-669580) — (2.417299) (+ Seofes) 





(7.497902) 
%o = 12.751293; 


(.377586) (—44.810802) — (3.160748) (—33. Stogfe) 





“a (7.497902) 
Vo = 11.936823. 


These coordinates, x, and y,, are plotted in 


Figure 1 and their point of intersection is 


denoted by CA, the center of accuracy. 





B”*) (b,c — be,) 
be +c2  * 





ie (b,¢ Bn bc,)? 


The observed w is obtained from the formula 
aye AH 
a k?(H + &2A)’ 





= D’x, + E’y, +H’, 


@ + bx. + CV 





eS ee ae 





Step Six. Determine whether a region ex- 
ists wherein the value of F(x’,y’) is signifi- 
cantly different from zero. To do this, first, 
establish the levels of significance at which 
the hypothesis H(x’,y’) will be accepted or 
rejected. These levels of significance will be 
denoted as wo, and W.o5. Second, compute 
the observed w; this is compared. with the 
established levels of significance, w., and 
W.os, to determine if a region of significance 
exists. 

The values of w.,, and w.,, are determined, 
respectively, from the following formulae: 


The values a, 5, and c are obtained from 
the equation of the line of non-significance: 
a@= .488724, 
= —.010686, 
.281822. 


The values a,, 5,, and c, are obtained from 
the following formulae: 
a, = D’c— E’b, 
ee == A’c — B’b, 
= B’c —C’b. 
George W., Statistical Methods. Ames, Iowa: 
Collegiate Pi Press, 195 1938. P. 187. 


C= 
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@, = (—44.810802) (.281822) — (—33.669580) (—.010686), 


a, = —12.988463; 


== (3.160748) (.281822) — (.377586) (—.010686), 


= .894803 ; 


= (.377586) (.281822) — (2.417299) (—.010686), 


C, = .132243. 


Substituting values from the computed data, 


_ _[(3-160748) (2.417299) — (.377586)?] [ (.894803) (.281822)—(.010686) (.132243)], 





A 


(.894803)* + (.132243)? 


A 2.322705; 


H = 86.906773; 


H = (—44.810802) (12.751293) + (—33.669580) (11.936823) + 1060.210256, 





k? = 
k? = .078600, and k = .280356; 


_ [ (894803) (.281822) — (—.010686) (.132243) ]? 
(.894803)? + (.132243)? 


? 


iach (.488724) + (—.010686) (12.751293) + (.281822) (11.936823) 





so 


.280356 
€ == 13.256464, and £7 = 175.733838. 


Wovs, May now be determined by substituting values from the computed data; 


(2.322705) (86.906773) 





Wobds. 
Wovs. = 5.187352. 


Having determined w,»;., it is now possible 
to determine if a region of significance exists. 
The rule to be followed is to accept the 
hypothesis, H(x’,y’), when the observed w is 
larger than w,,, and to reject it when it is 
less than w.,. Since 5.187352 (wWovs.) is less 
than 25.588235 (w.»,), the hypothesis 
H (x’,y’) is rejected, and a region of signifi- 
cance exists at the one per cent level. There- 
fore, excellent achievers are significantly 
better than poor achievers with respect to 
ability in subtraction, when the basic match- 
ing characters, mental age and chronological 
age, are controlled statistically. 

Step Seven. Determine the shape of the 
region of significance known to exist at the 
one per cent level. The shape of this region 
is determined by comparing the value of 
Ww», k? with that of A. When w.,, &* is greater 
than A the region is a hyperbola; if w,,, &’ 
is equal to A, the region is a parabola; and 
if w., %? is less than A, the region is an 
ellipse. The value of w.,, 2? is 

(25.588235) (.078600) = 2.011235, 


and the value of A is 2.322705. Since 
2.011235 (wo, &*) is less than 2.322705 (A) 
the region of significance is an ellipse. 


~~ (.078600) [ (86.906773) + (175.773838) (2.322705) ] 


Step Eight. Construct a graph and locate 
upon it the coordinates of excellent and poor 
achievers in terms of mental age and chrono- 
logical age, the line of non-significance, the 
diameter of the ellipse, the center of accu- 
racy, and the ellipse, as the final step in the 
analysis. 

First, construct a graph* on which the 
abscissa (x) represents mental age, and the 
ordinate (y) represents chronological age. 

Second, locate upon this graph the coordi- 
nate for each excellent and each poor 
achiever in terms of mental age (x) and 
chronological age (¥y). 

Third, locate the line of non-significance 
(on). This line is represented by the equation 
a+ bx + cy =o. In the present analysis 
the equation is 

F(x’,y’) = .488724 — .010686x’ + 
.281822y’ = 0. 


The coordinates of x and y which determine 
the location of the line of non-significance 
(0») are found by substituting known values 
of y in the equation F(x’,y’) =o. In sub- 
stituting, the values are as follows: 

* See Figure 1. 
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when y= 0,%= 45.73; 
y=—I,%— = 19.36; 
y= —2, x= — 7.01; 
Yua—3, Som — 33.38. 
Since all values used in this analysis are in 
terms of code scores, and since chronological 
age and mental age in Figure 1 are in terms 
of actual scores, the code scores appear oppo- 
site their respective raw scores. Therefore, 
while the line of non-significance (on) has 
been plotted in Figure 1 according to code 
scores, its position is also exact in relation to 
actual scores. 

Fourth, locate the diameter of the ellipse 
(o€). This line is represented by the equation 
a, + 6,” + c,y =o. In the present analy- 
sis the equation is 
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are presented opposite their respective raw 
scores. Therefore, while the diameter of the 
ellipse (0) has been plotted in Figure 1 
according to code scores, its position is also 
exact in relation to actual scores. 


Fifth, transform the coordinates, taking 
the line of non-significance (0) as the new 
axis of ordinates, and the diameter of the 
ellipse (of) as the new axis of abscissae. 
These new axes need not be at right angles 
to each other. Values measured parallel to 
o€ are positive on the side of the line of non- 
significance (0), where the value of F is 
positive, and negative on the other side of 
the line oy. The sign of € indicates the 
direction along the line of which is consid- 
ered positive or negative. 


0& = —12.988463 + .894803x" + .1322439’ =o. 


The coordinates, x and y, which determine 
the diameter of the ellipse (og), are found by 
substituting known values of x in the fore- 
going equation. In substituting, the values 
are as follows: 
when x=—I0, Y= 30.55; 
X== II, Y= 23-79; 
x= 12, y= 17.02; 
%= 13, Y= 10.25. 





Sixth, locate the region of significance, the 
ellipse. The equation for the region of sig- 
nificance is 


Wo, k? & — A(E— &)* — Cy? —H =o. 


This equation is the locus of points on the 
edge of the region of significance. Computing 
the value C* and substituting values previ- 
ously computed, the equation becomes 


2.011235€" — 2.322705 (€ — 13.256464)* — 3.188273n” — 86.906773 = 0; 
—.311470€? + 61.581710€ — 495.084637 — 3.188273? — 0. 





The diameter of the ellipse (of) always 
passes through the center of accuracy, and 





Dividing through by (—.311470), the equa- 
tion becomes 


€* — 197.713134€ + 1589.509862 + 10.2362127? =o. 





the distance from the center of accuracy to 





Completing the square, the equation becomes 


(€? — 197.713134€ + 9772.620837) + 10.236212n? = 9772.620837 — 1589.509862, 
(€* — 197.713134€ + 9772.620837) + 10.236212y? = 8183.110927, 
(€ — 98.856567)? + 10.236212y* — 8183.110927. 





the point of intersection of the o€ and oy 
lines is represented by the value é,, which is 
13.256464. Since all values used in this an- 
alysis are in terms of code scores, and since 
chronological age and mental age in Figure 1 
are in terms of actual scores, the code scores 


Dividing through by (8183.110927), the 
equation becomes 


(€ — 98.856567)? vi 








8183.110927. 
10.236212 


8183.110927 


b,c — be, (.894803) (.281822) — (—.010686) (.132243) . 





"C=—33 
C = 3.188273. 


(—.010686)? + (.281822)? 
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(€ — 98.856567)? ‘ 
8183.110927 

Transposing, 
7 — ,_. 4€— 98.856567)? 

(28.274152)? 8183.110927 

Extracting the square root of both sides, 

__ (€—98.856567)* 
8183.110927 


Multiplying both sides by (+28.274152), 


7 
=I, 
+ 799-427657 














I 


a ae 
+28.274152 
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= +28.274152 es (€ — 98.856567)? 
V 8183.110927 
The coordinates, » and é, which determine 
the shape of the ellipse, are found by substi- 
tuting known values of € in the foregoing 


equation. In substituting, the values are as 
follows: 
€é= I0 
f= 13 ’ 
é= 18 » 7 == +12.50; 
&= 28 » 9 == £17.46; 
& = 98.856867, » = +28.274152 (center of 
ellipse). 


The boundary of the ellipse, which repre- 
sents the region of significance, is plotted as 


> p=mest 5.81; 
y==+ 8.63; 
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cance, the ellipse, is plotted in Figure I. 
Above the line of non-significance the value 
is favorable to excellent achievers. This value 
is significant for that part of the population 
which is enclosed by the ellipse. Therefore, 
in general, pupils having mental ages of 109 
to 192 months and chronological ages of 130 
to 156 months, may be expected to do sig- 
nificantly better in subtraction if they are 
excellent achievers in division rather than 
poor achievers in division. 

Step Nine. Since the hypothesis H(x’,y’) 
that there is no difference between the means 
of excellent and poor achievers with respect 
to ability in subtraction when adjusted for 
variation in basic characters of matching has 
been rejected, it is also of value to determine 
from the data the best estimate they provide 
with respect to the true difference between the 
means of excellent and poor achievers. This 
estimate can be obtained by substituting in 
the equation, 


F(x’ ,y’) = .488724 — .010686x’ + .281822y’, 


the values of the coordinates of the center of 
accuracy which in this case are: 


Xo == 12.7512903, Yo== 11.936823. 
The equation then becomes 


F (Xo Yo) = .488724 — (.010686) (12.751293) + (.281822) (11.936823) 


F (% Yo) = 3-716529. 





follows: begin at the intersection of the lines 
on and og and measure ro positive units along 
the line of; here = 10, and » = +4.81. 
Now measure 4.81 units on either side of the 
line of and mark points. These points must 
be at the extremities of a line parallel to the 
line oy. Additional points, needed for locat- 
ing the boundary of the ellipse, are deter- 
mined in the same manner by using the re- 
maining values of and ». When an adequate 
number of points has been located, connect 
the points to form the boundary of the ellipse. 
The center of the ellipse is the point at 
which the ellipse is at its widest. 

Seventh, interpret the regions of signifi- 
cance. The position of the region of signifi- 





It can, therefore, be said that the best esti- 
mate of the true difference between the 
means of excellent and poor achievers in 
subtraction, when mental age and chronolog- 
ical age are controlled statistically, is 
3.716529 in favor of excellent achievers. 
The variance of the best estimate of the 
true difference between the means of excellent 
and poor achievers may be estimated from: 


S?, F? 
7 (n—s) SM, 
All values necessary for solving this equation 
have been computed with the exception of 
S*,. The value of S*, is found by dividing F? 
by P + Q. Substituting computed values, 


[.488724 — .010686x’ + .281822y’]? 





ie [.co1094x’? + 2(.000131)2’y’ + .000837y’? + 2(—.105508) 2” 





+ 2(—.100652)y’ + .366918], 
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Substituting the value of x, for x’ and the value of y, for y’: 


[.488724 — .010686 (12.751293) + .281822 (11.936823) |? 





S? 


alice [.co1094 (12.751293)? + 2(.000131) (12.751293) (11.936823) + .000837 (11.936823)? 





+ 2(—.105508) (12.751293) + 2(—.100652) (11.936823) +- .366918] P 


S*y = 72.601539. 


Substituting computed values, the variance 
of the best estimate of the true difference 
between the means of excellent and poor 
achievers is 


Vp = £2889-504635) (13.812589) 
- (180 — 6) (72.601539) 
Ve = 3.159391. 





’ 


The standard error of the best estimate of 
the true difference between the means of 
excellent and poor achievers is the square 
root of Vy or 1.777468. 


SUMMARY 

In summary the special merits of this 

technique described here are: 

1. The experiment has the property of 
being self-contained. 

2. No loss of information results since all 
observational values are used. 

3. The populations for which it is permis- 
sible to generalize from the sample in- 
formation are rigorously defined. 

. The problem of the test of significance 
and the problem of estimation are both 
solved. 





THE APPLICABILITY OF THE SPEARMAN-BROWN FORMULA 
TO TEACHERS’ MARKS IN COLORADO STATE 
COLLEGE OF EDUCATION’ 


LORAINE BRUCE 


Pampa Senior High School 
Pampa, Texas 


1. PROBLEM 


The object of this study is to determine 
the applicability of the Spearman—Brown 
prophecy formula to college marks on the 
undergraduate and master’s levels in Colo- 
rado State College of Education. In order to 
do this, the reliability of teachers’ marks for 
two groups of students was computed: 
(1) students on the undergraduate level, and 
(2) students on the master’s level. The reli- 
ability of teachers’ marks for these two groups 
was predicted by the use of the Spearman— 
Brown formula. The correspondence between 
actual and predicted coefficients of reliability 
of teachers’ marks was determined for both 
groups. 

2. SUBJECTS 


The 209 students who first enrolled in 
Colorado State College of Education in Sep- 
tember, 1935 and in September, 1936 and 
who remained in school twelve consecutive 
quarters made up the first group for whom 
complete and comparable records of teachers’ 
marks were computed for each student. The 
second group was made up of the 230 stu- 
dents matriculating on the master’s level in 
Colorado State College of Education from 
June, 1936 to August, 1938 and who were in 
attendance a minimum of four quarters. 


3. PROCEDURES 


All records of teachers’ marks were com- 
puted in terms of point-hour ratios. The 
point-hour ratio was computed by dividing 
the total number of points by the total num- 
ber of quarter-hours. The points were found 
by multiplying the number of hours of each 
course by the point value of the letter grades. 
The point value of “A” is 5; of “B” is 4; of 
>, 33 of —. 2; and of 7. I. 


1 Summary of Field Study No. 3. Greeley, Colorado: Colo- 
rado State College of Education, 1941. 


The Spearman—Brown prophecy formula,” 
de nr it 
1+ (n—1)ry’ 


was used in this study as a measure of the 
effect of cumulated marks. In this formula 
r, is the coefficient of reliability which would 
probably result from any given number of 
quarters, and ry is the coefficient for two 
quarters. The probable error of each coeffi- 
cient estimated by this formula was computed 
by Eugene Shen’s formula,’ which is 


__-6745 [n(1—F)] 
= VN [+ (n—1)ru}* 


In this probable error formula, riz is the 
original reliability coefficient, m the number 
of times the quarters were increased, and NV 
is the number of students. 

Comparisons were made between the 
obtained and predicted coefficients of reliabil- 
ity of teachers’ marks by finding the differ- 
ences and probable errors of the differences 
between them. In the main, the formula‘ 
which was used for the probable error of the 
difference was 


PE. ~ +1) = VPE*, + PE*, - 





Tn 





r 





This is the probable error of the difference 
between two coefficients of correlation calcu- 
lated from independent random samples. 
Since the coefficients were not derived from 
independent samples, the probable errors 
computed for their differences were not ap- 
plicable to this study; the values of these 
probable errors were too large to be reliable. 
If the probable errors thus computed indi- 
cated significant differences between the ob- 
tained and predicted coefficients, then com- 
putation by a formula which gave a smaller 


* Odell, C. W. Statistical Method in Education. New York: 

D. Appleton—Century Company, Inc., 1935. P. 210. 
*Garrett, H. E. Siatistics in Psychology and Education. 

New York: Longmans, Green and Company, 1938. P. 316. 
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probable error would only show additional 
assurance. 

The z-transformation was applied to the 
coefficients predicted from the correlation be- 
tween the first and third quarters of under- 
graduate work, in order to determine whether 
this computation would change the differences 
between the obtained and predicted coeffi- 
cients evaluated in terms of the short formula. 
According to Lindquist,’ this method for de- 
termining the significance of a difference is 
valid only if the coefficients are obtained 
from independent random samples. 

No test for the significance of a difference 
between obtained and predicted correlation 
coefficients has yet been devised. Such a test 
is especially needed by the research student 
in education. There is a longer formula for 
the probable error of the difference between 
two coefficients in which the two arrays are 
different, but in which the arrays are corre- 
lated with one another, which is probably 
more suited to this study. This formula® is 
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and were obtained from the point-hour ratios 
for the following quarters: (1) first and third 
quarters; (2) first and second quarters; 
(3) the sum of the first and second and the 
sum of the third and fourth quarters; and 
(4) the sum of the first and third and the 
sum of the second and fourth quarters. These 
reliability coefficients were: r,, = .664 + 
£026, 742 = 651 + .027,7 (1 + 2)¢3 +4) = -725 
+ .022, and 7; + 9)¢2 +4) = -766 + .O79. 

In order to study the reliability of marks 
throughout the twelve quarters of attend- 
ance, ten other coefficients of correlation be- 
tween single quarters were computed. These 
correlations range from .478 + .036 between 
the tenth and twelfth quarters to .731 + .022 
between the fifth and sixth quarters. The 
highest correlation was found for the ratios 
of the second and third quarters of the sopho- 
more year; the next two highest were the two 
which were substituted in the Spearman— 
Brown formula. The lowest coefficients were 
between the junior and senior years. 





PE, = VPE*,,, + PE*,,. — 


12 


[ (713 —Pr2% 03) (Fon — Tes7 sa) | 


9 + [ (ig — i534) (723 — P1872) | 


29 rsh 34 PE, 


PE, , in which 


12 





Pisksg 


+ [ (tis — Passa) (Fea — Tra%i2) | [ 


+ [ (Pie — P1224) (723 — Foe" se) | 


In this formula for the probable error, r,, is 
for the obtained r and r,, for the predicted r. 
The usefulness of this formula was impaired 
by the fact that the data for the correlations 
13) Tia) To3) and r,, were not available. The 
four unknown r’s were assumed to be equal 
to the average of the obtained and predicted 
coefficients; this is an unsupported assump- 
tion. In several instances this longer formula 
was used in order to determine its effect on 
the significance of differences obtained for 
the short formula. The differences for the 
other coefficients were either reliable by the 
short formula or were so small that the use 
of the long formula would not have shown 
them to be reliable. 


4. RESULTS 


Four product-moment coefficients of cor- 
relation for undergraduates were substituted 
in the Spearman—Brown prophecy formula 

uist, E, F. Statistical Analysis é Re- 


oe era Hil 


Book Company, Inc., 1940. P. 


1 
Lr onT ent 


Predicted r’s were found by substituting in 
the Spearman—Brown formula 7,, for 71 and 
two, three, four, five, and six, respectively, 
for m. Similar r’s were found by replacing 7,, 
by 7,2. The coefficients, 7,, + 2:3 +4) and 
Ts + 3)(2 +4), Were substituted in the proph- 
ecy formula and one and one-half, two, two 
and one-half, and three were successively 
substituted for m for each of these coefficients. 


The product-moment coefficients of cor- 
relation between the total point-hour ratios 
for two series of three, four, five, and six 
quarters each were computed. Each series 
was compared with the predicted coefficients 
based on the initial r’s to which each cor- 
responded. 

The results of substituting r,, in the 
prophecy formula were equivocal. These re- 
sults are shown in Table Ia. The differences 
between the predicted coefficient, .798 + 
.019, which was the result of substituting two 
for m, and the obtained r,, + 9:3 +4) == -726 
+ .022, was .072 + .029; this is not a sig- 
nificant difference. When the long formula 
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TABLE I 


COMPARISONS OF OBTAINED AND PREDICTED COEFFICIENTS OF RELIABILITY OF POINT-HouR RATIOS 
OF 209 UNPERGRADUATE STUDENTS WHO FIRST ENROLLED IN COLORADO STATE COLLEGE 
OF EDUCATION IN SEPTEMBER, 1935 AND IN SEPTEMBER, 1936, AND WHO 
WERE IN ATTENDANCE FOR TWELVE CONSECUTIVE QUARTERS 
Diff. 


Tobt. 1 Pred. Dif.” PE dif. PE gig 
1 2 4 5 6 
a. Reliabilities as Predicted from r,, = . 664 + . 026 


T(1+2)(8+4) 

T(1+2+85)(3-+4+7) 

T(14+2+5+6) (8+4+7+8) 
T(14245+6+9) (3+4+7+8+11) 
T(14+2+5+6+0+10) (3+4+7+8+11+12) 


T(143+5+7) (2+4+6+8) 
T(14+3+5+7+9) (2+4+6+8+10) 


T(1+34+5+7+0+11) (2+4+6+8+10+12) 


. 072 2.48 
**( 3.00) 
4.69 


. 029 
. 024) 
. 026 


.015 47 
.018 
. 020 


. 028 . 82 
. 025 3. 84 
-020) **( 4.80) 
.018 2.33 
. 021 4.71 


. 016 4.12 


= .—T obt. 
os ults obtained from the use of the long formula. 


for the probable error of the difference be- 
tween the r’s was computed, the difference 
still did not indicate complete reliability. The 
critical ratios of the differences between the 
obtained and predicted scores for the totals 
of two quarters and of four quarters indicate 
that there is no significant difference. In 
these two cases, the correspondence is suffi- 
ciently close to assume that the prediction 
formula could be used to determine the reli- 
ability of amalgamated quarters of teachers’ 
marks. In the cases of the amalgamated 
marks of three, five, and six quarters, the 
critical ratios of four and larger indicate with 
positive assurance that the estimated reliabil- 
ities are too large. The z-transformation was 
applied to the results derived from r,,. There 
were no appreciable differences when the dif- 
ferences between the obtained and predicted 
coefficients were evaluated in terms of the 
short formula and when they were evaluated 
in terms of the z-transformation. 

When r,, was used as the reliability co- 
efficient, the critical ratios indicate practically 
the same results as when r,, was used for the 
initial r. These results are shown in Table Ib. 


In the case of two and four quarters, the 
prophecy formula can be used to predict re- 
liability coefficients, and in the other three 
cases, the formula overpredicts decidedly. In 
the case of three quarters, the critical ratio 
was slightly under the four required for a 
significant difference. When the long formula 
for the probable error of the difference was 
used, the critical ratio was increased to 4.80. 
This ratio indicates virtual certainty that 
the estimated reliability was overpredicted. 

The coefficients predicted from 7,, + 2) 3 + 4)» 
which are shown in Table Ila, indicate that 
the only significant difference was that for 
six quarters. For the total of four quarters, 
the obtained coefficient was larger than the 
predicted coefficient by an almost reliable 
difference. With r,, as the initial r, the ob- 
tained coefficient for a total of four quarters 
was larger than the predicted reliability, but 
the difference was not reliable. When 
Tos +52 +4) Was used as the initial r, as is 
shown in Table IIb, the coefficients of the 
totals for three, five, and six quarters were 
overpredicted by reliable or almost reliable 
differences; all of these coefficients were 








March, 1942] 
















b. Reliabilities as Predicted from r(i+3)(2+4) = 








*T pred. —T obt. 








overpredicted by reliable differences when 
the long formula was used. 


The divergence from normality was meas- 
ured for all the distributions involved in 
computing the four reliability coefficients for 
the undergraduate ratios; the same measures 
were made for two of the distributions which 
were made up of totals of six quarters. With 
the exception of the kurtosis of the distribu- 
tion of the first quarter and both the skew- 
ness and kurtosis of the distribution of the 
third quarter, the divergence of the distribu- 
tions from the ideal normal curve is the result 
of chance fluctuations. Because of departure 
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APPLICABILITY OF SPEARMAN-BROWN FORMULA 


TABLE II 


COMPARISONS OF OBTAINED AND PREDICTED COEFFICIENTS OF RELIABILITY OF PoINT-HourR RATIOS 
OF 209 UNDERGRADUATE STUDENTS WHO FIRST ENROLLED IN COLORADO STATE COLLEGE 
OF EDUCATION IN SEPTEMBER, 1985 AND IN SEPTEMBER, 1936, AND WHO 
WERE IN ATTENDANCE FOR TWELVE CONSECUTIVE QUARTERS 


NG cccniitnninpnenmsnsntnnen s 
T(14+8+6-+7)(24+4+6+8) ----------------------- vi 
T(1434+5+7-+9)(2444+6+8+10)- ------------------ - 
T(14+3+5-+7-+0+11) (24+4+6+8+10+12)- -----.------ e 


-” Results obtained from the use of the long formula. 


TABLE III 


COMPARISONS OF OBTAINED AND PREDICTED COEFFICIENT OF RELIABILITY OF PornT-Hour RATIOS 
OF 230 STUDENTS ON THE MASTER’S LEVEL AT COLORADO STATE COLLEGE OF EDUCATION 
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Diff. 
T obt. T Pred. Diff.* PE diff. PE diff. 
1 2 3 4 5 6 
a. Reliabilities as Predicted from ra+2)(3+4) = .725 = .022 
I oi cri cincw patina ata ta 734 . 799 . 065 . 028 2.32 
r va 395 ss. B41 054 017 8.18 
saath sta sta cries =, 009 a 015 ; oe 014) oo i. 86) 
F(1-4-2-++-6-+-6+4-0)(8-+4+4-+-7-+6+411)----.------.-.-..-. . 836 . 869 . 033 . 020 1.65 
+. 015 +, 013 
T(14-24+5+649-+10)(34+4+7+8+11+12) -.----.-.---. .779 . 888 . 109 .021 5. 19 






. T52 - 831 . 079 . 025 3.16 
020 +. 015 **(.019) **( 4.16) 

. 840 868 . 028 . 018 1.56 
014 +. 012 

. 804 891 . 087 . 020 4.35 

.017 =. 010 

. 851 908 . 057 . 016 3. 56 

. 013 +. 009 **(.012) **( 4. 75) 


of the distributions from normality, the ob- 
tained and predicted coefficients based on 
Y,;, and to some extent those based on 7,., 
yield less reliable results than coefficients 
based on more symmetrical distributions. 

On the master’s level, the coefficients of 
reliability were computed between the point- 
hour ratios for the following quarters of work: 
the first and third quarters, 7,, = .393 + 
.038, and the first two quarters, r,, = .448 
+ .036. These two coefficients were substi- 
tuted in the prophecy formula and two was 
substituted for mn. The differences between the 
obtained and predicted coefficients are not 
reliable. The results are shown in Table III. 





Diff. 
Tobt. Pred.  Dif.* PE dif. “DPE aig 
1 2 3 4 5 6 
a. Reliability as Predicted from r,, = . 393 = . 038 
FA acicenndiapcbnimntihincigmicisdecteieniiies . 461 . 564 . 108 . 052 1.98 
=. 035 +. 039 
b. Reliability as Predicted from r,, = .448 = . 036 
WG Giscccdicncintigcsstésteskcannnvees : . 063 . 045 1.40 
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5. CONCLUSIONS 


According to the results of this study, the 
reliability of teachers’ marks, excepting those 
for the amalgamated ratios of two and four 
quarters, cannot be successfully predicted by 
the use of the Spearman—Brown prophecy 
formula. Evidently teacher’s marks do not 
satisfy the assumptions upon which the 
prophecy formula is based as to comparable 
and similar forms of measures, and conse- 
quently its application is of doubtful validity. 

The results indicate that reliability coeffi- 
cients of amalgamated quarters of marks may 
be predicted for totals of two and four quar- 
ters; the use of any other totals would result 
in overprediction. 

However, the results obtained in this in- 
vestigation should be accepted with some 
reservations because: (1) the students were 
not a random sample of college students; 
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(2) as the period of time during which the 
point-hour ratios were obtained extended over 
four years, the changes in the student may 
have rendered the use of the Spearman— 
Brown formula inappropriate; (3) several 
distributions of point-hour ratios departed 
more widely from normality than could be 
accounted for by change; (4) no adequate 
formula for computing the probable error of 
the difference between obtained and predicted 
coefficients was available for use with the 
data of this study. It is highly probable that 
if such a formula could have been applied, a 
larger number of reliable differences would 
have been found. On account of different 
grading systems and methods of determining 
marks in different educational institutions, it 
is not likely that a random sample of college 
students could have been used satisfactorily 
in such an investigation. 





THE RELATION OF PRIMARY MENTAL ABILITIES TO 
SCHOLASTIC SUCCESS IN PROFESSIONAL SCHOOLS 


Dewey B. Sturt and Harry H. Hupson 
University of Iowa 


The present study is a part of a larger 
primary abilities testing program sponsored 
by the American Council on Education in ten 
universities in the spring of 1939. The pri- 
mary purpose of that investigation was to 
determine the relation of primary ability test 
scores to vocational choices. The chief out- 
come of the study was the publication by 
Adkins' of vocational group profiles on the 
primary mental abilities for professional stu- 
dents in 12 occupational fields. No attempt 
was made to study the relationship between 
performance in the tests and academic 


achievement as measured by grades. The 
major purpose of the present study is to 
present data bearing on the latter point and 
also some supplementary evidence on the 
group profiles. 


The University of Iowa’s contribution to 


the American Council on Education study 
was to administer the A. C. E. Tests for Pri- 
mary Mental Abilities? to students in engi- 
neering, medicine, and journalism. The 
engineering group was composed of 27 stu- 
dents, and the journalism and medical groups 
were each composed of 29 students, making 
a total of 85 subjects for the study. The 
engineering and journalism grade point aver- 
ages* were based on seven semesters of work 
and the medical averages were compiled from 
five semesters of work. All zero order cor- 
relations were computed by the Pearson 
product-moment method and all multiple 
correlations by the Wherry--Doolittle proce- 
dure. The significance of correlation coeffi- 
cients was determined by use of the table of 
values required for significance as presented 
by Lindquist.* 


2 Adkins, Dorothy C. “The Relation of Primary Mental 
Abilities to Vocational Choice.” x4 Council on Educa- 
tion Studies, Series 5, Vol. 4, No. 2. Pp. 39-53. 

2 For a description of the tests ond the factors measured 
y= Thurstone, L. L. Manual of Instructions: Tests for 

nae Mental Abilities. Washington, D. C.: 
Counc on Education, 1938. Pp. 12. 
ott at B, C, D, 5 pois system is used. In computing 
Ce = ag the following weights were employed: 


Te. uist, = FS pore Analysis in =: Re- 
search. New York: Houghton Mifflin, 1940. P. 212. 


American 


The correlations between the factor scores 
and grade point averages are presented in 
Table I. Significant correlations at the 5 per- 
cent level were found between the Verbal, 
Memory, Induction, and Deduction factors 
and engineering grade point averages. For 
the journalism group, significant correlations 
were found between the Perception and 
Verbal factors and the criterion. No signifi- 
cant correlations were found between the 
factor scores and the criterion for the med- 
icine group. 

The correlations presented in Table I may 
seem somewhat confusing and illogical. For 
example, one might have expected the Spatial 
factor to correlate more highly with scholastic 
success in engineering and the Memory factor 
to correlate more highly with grades in med- 
icine. In the interpretation of these data two 
important facts should be remembered. 
First, the groups are very small; consequently 
only limited confidence can be placed in the 
correlations. Second, since the members of 
each group are advanced students who have 
done reasonably satisfactory work, all may 
have sufficient amounts of the vitally impor- 
tant t factors to insure their success, and con- 
sequently individual-differences may not be 
associated with variations in achievement. 
There is also the possibility, of course, that 
defects in the tests or the criteria of scholastic 
success may be responsible for the low cor- 
relations. 

Using the Wherry—Doolittle test selection 
method a multiple correlation of .614 was 
found between the criterion and the factor 
scores for the engineering group. Included in 
the selection was the Verbal factor, which 
correlated .577 with the criterion, and the 
Memory factor, which increased the multiple 
coefficient to .614. Additional factors added 
more chance error than they did validity to 
this coefficient. 

For the journalism group, it was found 
that the Verbal factor correlated .505 with 
the criterion, the Memory factor increased 
the multiple coefficient to .523, the Perception 
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TABLE I 


CORRELATIONS BETWEEN FACTOR SCORES AND GRADE POINT AVERAGES IN 
THREE PROFESSIONAL SCHOOL GROUPS 


Group 7 
Engineering -150 
Journalism 427 
Medicine .353 


V=Verbal 
S=Spatial 











P=Perception 
N=Number 


factor to .567, and the Deduction factor to 
-594. Additional factors added no validity to 
this multiple coefficient of correlation of .594 
between the factors and the criterion. 

A multiple coefficient of correlation of .416 
was found between the factors and the crite- 
rion for the medicine group. The Perception 
factor correlated .353 with the criterion, the 
Induction factor increased the correlation to 
.398, and the inclusion of the Deduction 
factor in the selection yielded the maximum 
multiple coefficient of .416, additional factors 
adding no validity. 

These multiple correlations suggest that 
tests of primary mental abilities possess some 
promise in the prediction of scholastic suc- 
cess in professional schools. When it is re- 
membered that all of the students included 
were nearing the completion of their work 
and thus constituted a homogeneous popula- 
tion, it seems a bit surprising that the corre- 
lations should be as high as they have been 
found to be in this investigation. 

To a vocational counselor the group pro- 
files are of as much interest as the correla- 
tions between scores in the tests and criteria 
of success. These profiles for engineering, 
journalism, and medicine are presented in 
Figures 1, 2, and 3 respectively. The vertical 
axis of the profiles represents a standard 
score scale, based on the performance of Uni- 
versity of Chicago students. A standard score 
of 1.00, for example, represents a performance 
equal to that of University of Chicago stu- 
dents one standard deviation above the mean, 
or the equivalent of the 84th percentile. 

The engineering group showed the highest 
single average factor score on the Deduction 
factor, with the Spatial, Number, Perception, 
and Induction factors being slightly below 
this peak, and the Verbal and Memory factors 
being only slightly above the average scores 
made by the University of Chicago freshmen 
on these factor tests. This is an important 


223 
318 
172 


M=Memory 
I=Induction 


Factor 
Vv Ss M I D 


577 178 563 400 385 
-505 015 232 821 057 
151 098 —.013 —.219 142 


D=Deduction 


point for consideration, since, it will be re- 
called, the Verbal and Memory factors cor- 
related highest with the criterion for this 
group. It should be noted that, although ab- 
solute scores may be high or low, the correla- 
tion does not necessarily correspond in rela- 
tive magnitude, due to the highly selective 
nature of the group. It appears that the 
Verbal and Memory factors are the most pre- 
dictive of academic success in engineering, 
since they correlate highest with the grade 
point average; but, once the professional 
group has been selected, the abilities repre- 
sented by these factors contribute little in 
the discrimination of the group from the pop- 
ulation average. This fact is important in the 
practice of educational as well as of voca- 
tional counseling. 


The journalism group showed the highest 
single average factor score on the Perception 
factor, with the Verbal and Number factors 
being slightly below this peak. The Deduc- 
tion, Memory, Spatial, and Induction factors 
were slightly below the normative scores for 
these factors. It is significant to note that 
high achievement in those abilities represented 
by the Deduction, Memory, Spatial, and In- 
duction factors is apparently not essential 
for professional. success in this field, since 
scores for this professional group all fell be- 
low the average scores of University of Chi- 
cago students. The implications of this find- 
ing for counseling lie in the fact that low 
scores on such abilities would not eliminate 
the field as a vocational possibility if other 
qualities pertinent were present. It should be 
noted that the deviations for these factors 
are not as great as is the deviation above the 
norm for the Perception factor. 

The item of greatest significance about the 
medical group profile is the fact that all of 
the factor scores are well above the average 
for a normative population. The Perception 
factor appears to have the highest discrim- 
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Figure 1, Engineering 
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Figure 2. Journalism 
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Figure 3. Medicine 
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inative value, followed by the Spatial and 
Number factors. This may be interpreted as 
indicating that the medical profession requires 
a rating above the average on each of the 
factors, with an emphasis on the Perception 
factor. It is interesting to observe that the 
Induction factor was above the average score 
for a normal population, although there was 
found a slightly negative correlation between 
this factor score and the criterion. 

For the vocational counselor and the per- 
sonnel technician, the discovery of charac- 
teristic profiles for various professional groups 
is a significant finding. It is important to 
discover that the factor scores based upon 
normative data for various groups show char- 
acteristic patterns, differentiating the profes- 
sional groups from the average population in 
terms of the particular abilities required for 
success in the various fields. This fact sup- 
ports the contention that unitary measures of 
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intelligence are not sufficient alone to char- 
acterize the mental ability requirements for 
a professional group. After a minimum level 
of mental ability has been established, 
measures other than the total scores on intel- 
ligence tests are necessary for discrimination 
between the individuals in the field. 


The results of this study, as well as those 
reported by other investigators, suggest that 
tests for primary mental abilities will have 
some value in educational and vocational 
counseling. Whether they will replace present 
vocational aptitude tests remains a problem 
for further research. No doubt many of these 
aptitude tests measure the same or nearly the 
same fundamental abilities. If these abilities 
can be identified by the technique of factor 
analysis, and tests constructed to measure 
them, a real service will have been rendered 
to scientific vocational guidance. 





RELATIVE DIFFICULTY OF TEST ITEMS OF THE REVISED 
STANFORD-BINET: AN ANALYSIS OF RECORDS 
FROM A LOW INTELLIGENCE GROUP 


ARTHUR L. RAUTMAN 


Northern Wisconsin Colony and Training School 
Chippewa Falls, Wisconsin 


A. INTRODUCTION 


Because workers in the field of mental 
measurement have long recognized the fact 
that an individual’s behavior during a psy- 
chometric examination furnishes more infor- 
mation concerning the examinee’s abilities 
and his capacity for future development than 
can be expressed by a single numerical sym- 
bol such as the mental age rating or the in- 
telligence quotient, attempts have been made 
to interpret test performance more compre- 
hensively. Before a complete and qualitative 
evaluation of performance on a psychometric 
examination is possible, however, there must 
be available to the psychologist certain de- 
tails relating both to the examinee and to the 
test, including, among other factors, informa- 


tion concerning the relative difficulty of the 
individual test items. For the purpose of 


comparing the individual items, research 
workers have subjected the tests of the 
Stanford-Binet examination to various spe- 
cial types of analysis. Growdon (2) has at- 
tempted to re-evaluate the test items of the 
Stanford-Binet in order to convert this test 
into a point scale of performance. A qualita- 
tive and quantitative study of the responses 
to individual test items of this same examina- 
tion was made by Martinson and Strauss (3) 
and Strauss and Werner (4) who compared 
the responses of normal children with those 
of mentally defective children. Gillette (1) 
has further attempted an item analysis of the 
test responses of a group of children brought 
to a guidance clinic in order to determine the 
relative difficulty of tests within year levels 
six through twelve to enable her to gain a 
more comprehensive understanding of test 
performance. Since work with special groups 
such as the mentally defective has seemed to 
show that certain test items may be unsuit- 
able under actual clinical conditions, we have 
attempted to determine by means of a sta- 
tistical analysis the relative difficulty of tests 


within each year level of the Revised 
Stanford-Binet, Form L, from Year II 
through Year XI for a group of subjects 
having a low intelligence rating. This paper 
summarizes certain results gained from this 
analysis. 

The data are based upon the test records 
of patients who were examined by the psy- 
chologist at a state institution for the care 
and training of mental defectives. All exam- 
inations were conducted by the writer, and 
no records of patients with special patholog- 
ical conditions and handicaps are included 
since the Stanford-Binet test is given only 
to patients who are able to give a reliable 
response on this type of examination. 

The study includes data from one thousand 
consecutively examined patients having an 
intelligence quotient rating of less than 80. 
The range in intelligence quotient was from 
13 through 79, and the mean intelligence 
quotient for the group was 49.75, sigma 
15.27. In mental age the group ranged from 
2 years to 11 years, 10 months, the average 
mental age being 6 years, 9.3 months (sigma 
2 years, 4.9 months). The range in chrono- 
logical age was high, from 3 years to 81 
years, and the average chronological age for 
the group was 19.63 years (sigma 9.67 
years). 

Since the data studied are drawn from a 
special group such as one would ordinarily 
find in an institution for mental défectives, 
the findings of this study will necessarily be 
limited by this fact. However, although the 
group surveyed is not representative of the 
population in general, it probably is typical 
of the inmate populations usually found in 
public and private institutions for mentally 
defective patients. 


B. PROCEDURE 
In this study we have been concerned with 
the relative difficulty of the various tests 
within each particular year level and have 
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made no attempt to evaluate the placement 
of the items at a specific level. The relative 
difficulty of the tests within a year level was 
determined by calculating and comparing the 
percentage of individuals who were able to 
pass each test. In this manner the relative 
difficulty of the tests at each year level from 
Year II through Year XI was determined for 
the group having the same mental age as the 
test level under consideration; these groups 
are called the “Mental Age groups.” In addi- 
tion, the relative difficulty of the tests at each 
of these year levels was determined for the 
“Total group;” i.e., for the group consisting 
of all individuals to whom a particular test 
year had been given. 

After the relative difficulty of the tests 
within a year level had thus been established 
by comparing the percentage passing each 
test within a given test year, the reliability 
of the differences between these difficulty 
values (percentages passing) was determined. 
Since within each year level the percentages 
for the various tests were obtained on the 
same group of subjects, the standard error of 
the differences between two percentages was 
calculated by means of the complete formula: 

o diff. /o*p, + o°f, — 27 pip, op,op, 

For both the Mental Age and the Total 
groups the data for each year are presented 
in tabular form, the tests in each case being 
identified by name and serial number, and 
listed in order of difficulty. The percentage 
of individuals passing each test is indicated, 
and there is also shown the serial number of 
each test from which a specific test is reliably 
different; that is, cases in which the critical 
ratio between the two tests was found to be 
3 or more. 

Since psychometric work with special 
groups like the mentally defective has often 
seemed to indicate that performance on cer- 
tain parts of the Stanford—Binet test is more 
affected by life and school experiences than 
is performance on other test items, an 
attempt was made to study the influence of 
life age upon individual tests at each of the 
year levels included in this study. For this 
purpose it was necessary to compare, at each 
year level, the performance of groups of 
patients having approximately the same 
mental age but differing in chronological age. 
Such groups were obtained by selecting all 
individuals with a mental age corresponding 
in number to the particular year being an- 
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alyzed, and, in addition, those whose mental 
age is included within the range from six 
months below to six months above this level. 
Within each of these selected groups the test 
records were arranged according to the ex- 
aminees’ chronological ages, and then divided 
on the basis of chronological age into three 
sub-groups, a “younger,” a “middle,” and an 
“older” group. 

For each of these three age groups the per- 
centage of individuals passing each test was 
determined and the reliability of the differ- 
ences in relative difficulty between the groups 
was calculated. Since these data are drawn 
from separate groups and hence the correla- 
tions between them may be assumed to be 
zero, the following formula was employed in 
calculating the critical ratios: 


o diff. = \/o*p, + o*f, 


The data showing the percentage of indi- 
viduals passing each test in the three age 
groupings for each year level are presented 
in table form. These tables also show the 
mental age range included in the year level 
studied, and, in addition, the range and the 
mean for both the chronological age and the 
intelligence quotient for each of the three age 
groups under comparison. 


C. RESULTS 
1. Year II-o 


The rank order of the tests for the Mental 
Age group and for the Total group at year 
level II-o is presented in Table 1. In the 
Mental Age group the percentage of exam- 
inees passing the various test items ranges 
from 94% for the form board to 75% for 
identifying parts of body. Due to the rela- 
tively small number of individuals involved, 
however, none of these differences reach sta- 
tistical significance. 

For the Total group; i.e., for all individ- 
uals who took the tests at this level, the 
percentage passing a test ranged from 99% 
for the form board to 89% for identifying 
parts of body. As is indicated in the extreme 
right column of Table 1, the percentages 
passing form board, 99%, and word combi- 
nation, 99%, are each reliably different from 
the percentages passing identifying objects by 
name, 91%, and identifying parts of body, 
89%, the two most difficult tests at this year 
level. 
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TABLE 1 
Year IJ-0: RELATIVE DIFFICULTY AND RELIABILITY OF DIFFERENCES 
MENTAL AGE GROUP TOTAL GROUP 
N=16 N =105 
TEST % Reliably TEST % Reliably 
Pass- _ diff. from Pass- _ diff. from 
No Name ing Test No. No. Name ing Test No. 
1 Form board___.__--- 94 1 Form board_-_-.__-- 99 2,3 
4 Block tower_-_-_--_--- 94 6 Word comb.----.--- 99 2,3 
6 Word comb..--_---- 94 4 Block tower______-- 96 
5 Te ee ecu 81 2 > nen 93 
2 Ident. by name----- 75 2 Ident. by name- ---- 91 1,6 
3 Parts of body___.-_-- 75 3 Parts of body_-___-- 89 1,6 
TABLE 2 


Year II-0: PERCENTAGE PASSING EACH TEST FOR THE THREE AGE GROUPS 
M.A. Range: 2-0 Through 2-11 


Intelligence 
Chronological Age Quotient PERCENTAGE PASSING TEST 
Group N Range Ave. Range Ave. 1 2 3 4 5 6 
Younger... 22 3-9 to 8-7 5.8 23-76 45.5 95 91 91 100 86 95 
Middle.... 22 8-9 to 13-8 10.9 16-33 238.0 100 95 91 95 91 100 
Gider........ 23 14-0 to 73-9 20.5 13-20 17.5 100 83 74 95 86 95 


A group was then selected with a limited 
mental age range, from 2 years through 2 
years, 11 months. There were 67 individuals 
within this mental age range who had been 
given the Year II-o tests. As described 
above, the records of these individuals, thus 
selected for comparable mental ability, were 
divided on the basis of chronological age into 
three groups, a “younger,” a “middle,” and 
an “older” group (Table 2). The percentages 


of examinees passing the various tests for 
these age groups show no statistically signifi- 
cant differences. On the tests at this year 
level, life age, experiences, and schooling 
apparently have little effect upon perform- 
ance. 


2. Year II-6 


The tests at the II-6 year level, arranged 
in order of difficulty, are presented in Table 3. 





TABLE 3 
Year II-6: RELATIVE DIFFICULTY AND RELIABILITY OF DIFFERENCES 
MENTAL AGE GROUP TOTAL GROUP 
N=61 N=162 
TEST % Reliably TEST % Reliably 
Pass- iff. from Pass- diff. from 
No. Name ing Test No. No, Name ing Test No. 
6 Form board: rot..... 78 1, 2,4 & Two digts.......... 80 1 
3 Naming obj.__-_----- 72 4 3 Naming obj.___..--- 79 
-. t  __ 70 6 Form bo i... -ae 
2 Parts of body---_---- 63 6 2 Parts of body___-__-_- | 
1 Ident. by use_---_--- 61 6 2. * eae 75 
-_—. ( ene 56 3, 6 1 Identjby use------_- 74 5 
TABLE 4 


Year II-6: PERCENTAGE PASSING EACH TEST FOR THE THREE AGE GROUPS 
M. A. Range: 2-0 Through 3-5 


Intelligence 
Chronological Age Quotient PERCENTAGE PASSING TEST 
Group N Range Ave. Ave. 1 2 3 4 5 6 
Younger... 35 3-9 to 8-9 5.8 238-76 48.5 71 83 71 60 83 69 
Middle mee) ae 8-9 to 14-0 10.8 16-38 26.1 59 62 76 76 76 71 
Older-_-_-- 84 14-7 to 73-9 26.3 13-26 19.3 59 59 73 61 61 79 
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The actual difficulty values and the serial 
numbers of the items from which each test 
differs significantly can be obtained by in- 
spection of the table. Identifying objects by 
use and the picture vocabulary appear to be 
the most difficult tests at this year level. 
For the different age levels (Table 4), 
identifying parts of body (Test 2) again 
to be easier for the younger as com- 
pared with the older group, difference 24%, 
critical ratio 2.27. A similar difference was 
also found for this same test as it appears on 
the II-o level, 17%, critical ratio 1.55. The 
younger group also appears to do slightly 
better than the older group on repeating two 
digits (Test 5). However, none of the differ- 
ences between the three age groups at this 
level approach statistical significance. 


3. Year III-o 


At the III-o year level the percentages 
ing each test show marked consistency 
(Table 5). In the case of both the Mental 
Age and Total groups, drawing a circle and 
stringing beads are easier than repeating 
three digits and picture memories, with dif- 
ferences that are clearly reliable. 
For the three age levels, repeating three 
digits (Test 6) is easiest for the younger 
group and most difficult for the middle group 
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younger and middle groups on this test, 37%, 
critical ratio 3.70, is the only difference at 
this year level within the range of statistical 
reliability. However, the difference on this 
same test between the older and younger 
groups is 25%, and a critical ratio of 2.44 
shows that there are 99 chances in 100 that 
this is a true difference. 


4. Year ITI-6 


The percentage passing each test at the 
III-6 year level is shown in Table 7. The 
only inconsistency in the order of difficulty is 
that picture vocabulary and identifying 
objects by use are rated fourth and fifth in 
order of difficulty for the Mental Age group 
and fifth and fourth respectively for the 
Total group. For the Mental Age and the 
Total groups, pictures I is reliably different 
from all other test items. For a low mentality 
group this item appears to be too difficult at 
this level to have much discriminative value, 
since only 10% of the Mental Age group and 
23% of the Total group were able to pass 
the test. For both groups, simple commands, 
comparison of sticks, and comprehension I 
are the easiest tests in the order named. 


At this level (Table 8) the only difference 
between the three age groups which ap- 
proaches statistical significance is comprehen- 





(Table 6). The difference between the sion J] (Test 6), the differences between the 
TABLE 5 
Year III-0: RELATIVE DIFFICULTY AND RELIABILITY OF DIFFERENCES 
MENTAL AGE GROUP TOTAL GROUP 
N=35 N=214 
TEST % Reliably TEST % Reliably 
Pass- _ diff. from Pass- _ diff. from 
No. Name ing Test No. No. Name ing Test No. 
FP EE 91 3, 4, 6 i eee 92 2, 3, 4, 6 
1 Stringing beads.._.. 88 4, 6 1 Stringing beads --- 91 2, 3, 4, 6 
Ff eee 4,6 3 Block bridge_______- 82 1, 2, 4, 5, 6 
8 Block bridge______-_- 69 4,5  % SS 67 1, 3, 4, 5, 6 
6 Three digits... _- 54 1, 2, & 6 Three digits.___.__. 61 1, 2, 3, 4, 5 
cr -, | aos 40 1, 2, 3, & & Fae Sek... «...... 56 1, 2, 3, 5, 6 
TABLE 6 
Year III-0: PERCENTAGE PASSING EACH TEST FOR THE THREE AGE GROUPS 
M. A. Range: 2-6 Through 3-11 
Intelligence 
Chronological Age Quotient PERCENTAGE PASSING TEST 
Group N nge Ave. Range Ave. 1 2 3 4 5 6 
Younger... 43 8-9 to 9-0 6.3 81-76 . 47.4 93 53 88 42 86 72 
Middle_... 43 9-0 to 16-10 11.6 17-43 . 27.3 90 59 79 53 93 35 
Older... -..- 48 17-2 to 73-9 26.5 17-26 22.3 98 67 77 51 95 47 
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TABLE 7 
Year III-6: RELATIVE DIFFICULTY AND RELIABILITY OF DIFFERENCES 
MENTAL =] GROUP TOEM Gaur 
TEST % Reliably TEST % Reliably 
Pass- diff. from Pass- diff. from 
No. Name ing Test No. No. Name ing Test No. 
1 Simple commands... 70 4 1 Simple commands... 63 2, 3, 4, 5, 6 
3 Compar. sticks___-__- 58 4 3 Compar. sticks----__ 58 1, 2, 4, 5 
@ Gan B....«..... 58 4 © Geen &.......... 57 1,2,4 
a. .\,. =a 56 4 5 Ident. by use__-_--_- 54 1, 3,4 
5 Ident. by use______- 54 4 OS eee 51 1, 3, 4, 6 
5} Cares 10 1, 2, 3, 5, 6 > eee 23 1, 2, 3, 5, 6 
TABLE 8 
Year III-6: PERCENTAGE PASSING EACH TEST FOR THE THREE AGE GROUPS 
M. A. Range: 3-0 Through 4-5 
Intelligence 
Chronological Age Quotient PERCENTAGE PASSING TEST 
Group N Range Ave. Range Ave. 1 D4 3 4 5 6 
Younger... 38 4-6 to 10-1 7.0 82-73 650.4 82 47 55 11 55 42 
Middle___. 37 10-6 to 18-5 14.1 23-39 28.6 70 61 59 28 62 72 
Gile..<.<. 388 18-6to58-11 31.0 20-30 25.6 63 61 68 29 55 71 
older and younger groups and between the 5. Year IV-o 


middle and younger groups being 29% and Table 9 shows the percentage passing the 
30% respectively. There are approximately yarious tests for the IV—o year level. Both 
99 chances per hundred that these represent the Mental Age and the Total groups find the 
true differences. Apparently increased life picture completion of a man: 1 point the 
age and experience give an examinee a slight most difficult; the differences between this 


advantage on this test item. and all other tests for this age level are sta- 
TABLE 9 
Year IV-0: RELATIVE DIFFICULTY AND RELIABILITY OF DIFFERENCES 
MENTAL AGE GROUP TOTAL GROUP 
N =35 N =271 . 
TEST % Reliably TEST % Reliably 
Pass- _ diff. from Pass- diff. from 
No. Name ing Test No. No. Name ing Test No. 
S 2eewee.......a.. Te 3 2 Obj. from mem..---- 71 1, 3, 4, 5 
2 Obj. from mem..----- 77 3 @ Cope Be. ........ 70 1, 3,5 
& Fae Se... cu..--. 77 3 a Fee wee. .....5-. 67 1, 2, 3, 5 
=. “2. 71 3 ss eee 63 2, 3, 4, 6 
6 Compre. IT---.-..--- 59 Fees, Vee............ & 2, 3, 4, 6 
3 Pict. compl.:man... 43 1, 2, 4, 6 8 Pict. compl.:man__. 657 1, 2, 4, 5, 6 


TABLE 10 
Year IV-0: PERCENTAGE PAssING EACH TEST FOR THE THREE AGE GROUPS 
M. A. Range: 3-6 Through 4-11 


Intelligence 
Chronological Age Quotient PERCENTAGE PASSING TEST 
Group N Range Ave. Range Ave. 1 2 3 4 5 6 
Younger... 43 5-5 to 12-9 8.7 28-73 46.8 49 86 53 51 79 37 
Middle__.. 42 12-llto21-1 16.6 24-35 29.2 73 78 57 86 71 61 


Older-....- 42 21-2to54-11 30.1 22-34 29.2 69 71 55 76 60 67 
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tistically significant for both groups with the 
single exception of comprehension II for the 
Mental Age group. 

A greater percentage of the older group 
than of the younger group (Table 10) suc- 
ceeds on picture identification (Test 4) and 
also on comprehension II (Test 6). The dif- 
ference between the younger and older groups 
for the picture identification test is 25%, 
critical ratio 2.48, and that between the 
younger and middle groups is 35%, critical 
ratio 3.76. On the comprehension II test, the 
difference between the younger and older 
groups is 30%, the critical ratio showing that 
there are more than 99 chances in 100 that 
this difference is reliable. 

The older and middle groups also seem to 
have a slight advantage over the younger 
group on picture vocabulary (Test 1), and, 
although the differences are not statistically 
significant, there are, nevertheless, approxi- 
mately 98 chances in 100 that the obtained 
difference is reliable. 


6. Year IV-6 


At the IV-6 year level, test number 5, 
three commissions, is omitted and the Alter- 
nate test, picture identification, is substituted. 
The percentage passing for each of the tests 
on this age level is given in Table 11. For 
both the Mental Age and the Total groups, 
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aesthetic appreciation and picture identifica- 
tion are reliably easier than all of the other 
tests. Opposite analogies appears to be the 
most difficult item in this group of tests and 
is reliably different from the other tests. 

In comparing the performance of the three 
age groups, we find that picture comparison 
(Test 3) is the only test for which a statis- 
tically reliable difference can be demonstrated 
(Table 12). On this item 37% more of the 
younger group than of the older group re- 
ceived a passing grade, the middle group 
coming between the two. 


7. Year V 


The tests of the V year level, listed in 
order of difficulty for both the Mental Age 
and Total groups, are given in Table 13. In 
both groups the square, folding a triangle, 
and definitions are reliably less difficult than 
picture completion of a man: 2 points and 
memory for sentences II. 

At this year level (Table 14) none of the 
differences between the various age groups 
can be considered completely reliable. The 
middle group did better than the older on the 
picture completion of a man (Test 1), while 
the older group seemed to have an advantage 
over the younger on definitions (Test 3), the 
critical ratios being 2.49 and 2.75 respec- 
tively. 


TABLE 11 
YEAR IV-6: RELATIVE DIFFICULTY AND RELIABILITY OF DIFFERENCES 
MENTAL AGE GROUP TOTAL GROUP 
= 49 N =298 7 
TEST % Reliably TEST % Reliably 
Pass- diff. from Pass- diff. from 
No. Name ing Test No. No. Name ing Test No. 
1 Aesth. comp......... 78 2, 3, 4, 6 1 Aesth. comp.___----- 68 2, 3, 4, 6 
& Bee oe. ......... 78 2, 3, 4,6 re  * ~ pees 66 2, 8, 4, 6 
& 2eeeee........... & 1,6, A S Fae G........- 53 1,6, A 
3 Pict. compar........ 55 1,6,A 5 Few @eee......... 51 1,6,A 
2 Four digits.____. 47 1,6,A 4 Materials___..._.__- 50 1,6, A 
6 Opp. anal.I_______- 27 1, 2,3,4, A © Gee a........ 41 1, 2,3,4,A 
TABLE 12 
Year IV-6: PERCENTAGE PASSING EACH TEST FOR THE THREE AGE GROUPS 
M. A. Range: 4-0 Through 5-5 
Intelligence 
Chronological Age Quotient PERCENTAGE PASSING TEST 
Group N Range Ave. Range Ave. 1 2 3 4 6 Alt. 
Younger... 47 5-6 to 14-6 10.0 81-73 60.0 79 45 74 51 43 70 
Middle. _.. 47 146 to 22-3 18.6 27-88 32.3 68 49 57 49 29 70 
| 48 22-3to54-11 30.3 27-36 32.2 73 54 87 69 37 87 
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TABLE 13 
YEAR V: RELATIVES DIFFICULTY AND RELIABILITY OF DIFFERENCES 
MENTAL AGE GROUP TOTAL GROUP 
N=145 N = 462 
TEST % Reliably TEST % Reliably 
Pass- diff. from Pass- diff. from 
No. Name ing Test No. No. Name ing Test No. 
_ ae 98 1, 5, 6 2 Folding triangle-_-__- 88 1, 3, 5, 6 
2 Folding triangle_-__-- 93 1,5 eF- £4 eee 87 1, 5, 6 
3 Definitions____-__-_-- 93 1, 5 S Delweme........- 86 1, 2, 5, 6 
6 Counting obj._.__-_- 88 1, 4, 5 6 Counting obj....___- 81 1, 2, 3, 4, 5 
1 Pict. compl.: man (2) 81 2, 3, 4, 5, 6 6 Sent. mem. II______- 57 1, 2, 3, 4, 6 
5 Sent. mem. II_-_-_-_-- 56 1, 2, 3, 4, 6 i Pict. compl.: man (2) 652 2, 3, 4, 5, 6, 
TABLE 14 
YEAR V: PERCENTAGE PASSING EACH TEST FOR THE THREE AGE GROUPS 
M. A. Range: 4-6 Through 6—5 
Intelligence 
Chronological Age Quotient PERCENTAGE PASSING TEST 
Group N Range Ave. Range Ave. 1 2 3 4 5 6 
vom. ie 8-1 to 14-7 11.0 32-72 60.2 81 92 88 96 55 91 
Middle._.. 95 14-8 to21-7 17.1 30-45 38.4 8e 90 95 94 62 85 
Givin. 94 21-10to57-0 28.9 80-43 = 37.1 74 91 98 97 65 86 


8. Year VI 


Because of the large number of cases upon 
which the data of the following year levels 
are based, even small differences in relative 
difficulty are statistically reliable. At the VI 
year level in both the Mental Age and the 
Total groups (Table 15), all tests show reli- 
ably different levels of difficulty with the fol- 
lowing three exceptions: for the Mental Age 


group, the difference between vocabulary and 
maze; and for the Total group, the difference 
between numbers and mutilated pictures, and 
also that between maze and picture com- 
parison, 


On vocabulary (Test 1) 28% more of the 
older group than of the younger group re- 
ceived a passing score (critical ratio 5.04), 
with the percentage for the middle group 


TABLE 15 
YEAR VI: RELATIVE DIFFICULTY AND RELIABILITY OF DIFFERENCES 
MENTAL AGE GROUP TOTAL GROUP 
N =164 N=711 
TEST % Reliably TEST % Reliably 
Pass- diff. from Pass- diff. from 
No. Name ing Test No. No. Name ing Test No. 
S Diek pick. ......... 96 1, 2, 4, 5, 6 4 Number concepts. 76 1,3, 66 
4. Number concepts.... 89 1, 2, 3, 5, 6 S tee peee.......-- 75 1, 2, 5, 6 
1 Vocabulary-_-------- 82 2, 3, 4, 5 1 Vocabulary----_---_-- 68 2, 3, 4, 5, 6 
_- = Sia 79 2, 3, 4, 6 >_> > aa 66 1, 2, 3,4 
5 Pict. compar._-__--- 3 1, 2, 3, 4, 6 5 Pict. compar. ----_-- 66 1, 2, 3,4 
2 Bead chain I_-___--- 66 1, 3, 4, 5, 6 2 Bead chain t_______- 63 1, 3, 4, 5, 6 
TABLE 16 
YEAR VI: PERCENTAGE PASSING EACH TEST FOR THE THREE AGE GROUPS 
M. A. Range: 5-6 Through 7-5 
Intelligence 
Chronological Age Quotient PERCENTAGE PASSING TEST 
Group N Range Ave. Range Ave. 1 2 3 4 5 6 


Younger... 109 8 8 to 14-8 11.4 39-76 
Middle._.. 109 149to20-10 17.1 37-51 
Gennes 109 21-0 to 62-2 28.9 37-49 


54.7 61 86 96 91 88 79 
43.4 76 68 94 85 83 75 
42.6 89 56 85 90 57 65 
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between these two (Table 16). On the other 
hand, the younger group did reliably better 
than either the middle or older group on the 
bead chain (Test 2). Test 5, picture compar- 
ison, gave reliably different results in the 
percentage passing, the older group being 
31% and 26% below the young and middle 
groups respectively. 

Although both the younger and the middle 
groups had less difficulty than the older group 
with the mazes (Test 6), in neither case was 
the difference completely reliable, the critical 
ratios being 2.06 and 1.63 for the differences 
between the younger and older groups and 
between the middle and older groups 


respectively. 
g. Year VII 


The tests for the VII year level, arranged 
in order of difficulty, are given in Table 17. 
Tests in this group show a wide variation in 
relative difficulty, since the percentage pass- 
ing ranges from 82% to 33% for the Mental 
Age group and from 72% to 26% for the 
Total group. Opposite analogies seems to 
offer the most difficulty at this level, for only 
26% of the 735 individuals to whom this test 
was given were able to pass it. 

Of the various tests at this year level 
(Table 18), Test 1, picture absurdities, is the 
only one which shows a reliably different re- 
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sult for the age groups: 23% more of the 
younger group than of the older were able to 
pass this test. 

Although the older group did better than 
either the middle or the younger on the two 
Similarities item (Test 2), the differences 
were not completely reliable, the critical 
ratios being 2.40 and 2.55 respectively. 


10. Year VIII 


All of the differences in relative difficulty 
of the tests at the VIII year level (Table 19) 
can be considered as statistically reliable 
with one exception: in the Mental Age 
group verbal absurdities and similarities and 
differences are not reliably different from 
each other. The order of difficulty is the 
same for both groups, vocabulary being the 
easiest and memory for sentences III the 
most difficult test at this year level. 

The older patients (Table 20) perform de- 
cidedly better on the vocabulary (Test 1) 
than the younger; the difference is 27%, 
critical ratio 4.75. On the Wet Fall item 
(Test 2), however, only 42% of the older 
group as compared with 80% of the younger 
were able to pass. On this test the perform- 
ance of the older group was not only reliably 
poorer than that of the younger, but reliably 
poorer than that of the middle group as well. 

A higher percentage of the younger exam- 
inees than of the older ones passed on Simi- 


TABLE 17 
Year VII: RELATIVE DIFFICULTY AND RELIABILITY OF DIFFERENCES 
MENTAL AGE GROUP TOTAL GROUP 
N=141l N =735 

TEST % Reliably TEST % Reliably 

Pass- _ diff. from Pass- diff. from 

No. Name ing Test No. No. Name ing Test No. 
1 Pict. absurd. I____-_- 82 2, 3, 5, 6 4 Compre. III_.._.._- 72 1, 2, 3, 5, 6 
4 Compre. ITI__.._._- 75 2,5 S Heeee........... 58 1, 2, 4, 5, 6 
3 Diamond........... 174 1,3, & 1 - Pict. absur. I__...-. 57 2, 3, 4, 5, 6 
6 Five digits__ 69 1, 3, & 6 Five digits.......... 52 1, 2, 8, 4, 5 
2 Two simil.. 57 1, 3, 4, 5, 6 .- aa 44 1, 3, 4, 5, 6 
5 Opp. anal. I. 33 1, 2, 3, 4, 6 S Gee eee S...<.... 26 1, 2, 3, 4, 6 

TABLE 18 
YearR VII: PERCENTAGE PASSING EACH TEST FOR THE THREE AGE GROUPS 
M. A. Range: 6-6 Through 8-5 
Intelligence 

Chronological Age Quotient eee whee ier ‘ter TEST 

Group A Range Ave. Range Ave. 3 5 6 
vow. . " 9-5 to 15-8 12.7 45-78 657.6 ps > 73 p 33 62 
Midd " 15-9 to 22-3 18.2 43-56 49.1 76 48 71 68 33 62 
Gom....... 22-4 to 66-0 80.6 438-56 49.7 65 66 73 17 34 69 
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TABLE 19 
Year VIII: RELATIVE DIFFICULTY AND RELIABILITY OF DIFFERENCES 
MENTAL AGE GROUP TOTAL GROUP 
vest og Retin vest 8 Reliab 
lably 
Pass- diff. from Pase- diff. from 
No. Name ing Test No. No. Name ing Test No. 
1 Vocabulary---.--.-- 92 2, 3, 4, 5, 6 1 Vocabulary......... 60 2, 3, 4, 5, 6 
5 Compre. Sectnateas : Se 1, 2, 3, 4, 6 5 Compre. IV__....--- 55 1, 2, 8, 4, 6 
= apy 65 1, 3, 4, 5, 6 S We Oieue<.<.-.. 47 1, 3, 4, 5, 6 
3 Verb. absurd. I_____- 64 1, 2, 5, 6 8 Verb. absurd.I...... 44 1, 2, 4, 5, 6 
_— *\ eee 53 1, 2, 5, 6 4 Co =r 1, 2, 3, 5, 6 
6 Sent. mem. III-_----- 39 1, 2, 3, 4, 5 6 Sent. mem. III_----- 37 1, 2, 3, 4, 5 
TABLE 20 


Year VIII: PERCENTAGE PASSING EACH TEST FOR THE THREE AGE GROUPS 
M. A. Range: 7-6 Through 9-5 


Intelligence 
Chronological Age Quotient PERCENTAGE PASSING TEST 
Group N Range Ave. Range Ave. 1 2 3 4 5 6 
Younger... 74 9-0 to 16-6 13.6 51-78 60.0 70 80 55 64 76 45 
Middle_... 74 16-7 to 23-7 19.0 50-63 65.3 88 72 50 59 69 35 
Older...... 74 28-10 to81-10 33.0 50-62 65.4 97 42 49 42 78 56 


larities and differences (Test 4) also, there 
being 99 chances in 100 that the difference, 
22%, is a true difference. 


On memory for sentences III (Test 6) the 
middle group rated lower than either the 
younger or the older group. Although the dif- 
ference is not completely reliable, more of 
the older group than of the younger group 
passed this test. 


TABLE 21 


11. Year IX 


Year IX: RELATIVE DIFFICULTY AND RELIABILITY OF DIFFERENCES 


MENTAL AGE GROUP 
N =84 


TEST % Reliably 

Pass- iff. from 
No Name ing Test No. No. 
5 Making change------ 82 1, 2, 3, 4, 6 5 
6 Four digits rev._.. _- 66 1, 2, 4, 5 3 
3 a 63 1, 2, 4, 5 6 
1 Paper cutting I___-_- 52 2, 3, 5, 6 1 
@ Bs deecessss- 50 8, 5, 6 4 
2 Verb.absurd.II .... 42 1, 3, 5, 6 2 
TABLE 22 


At the IX year level (Table 21), 46% of 
the Total group could solve the making 
change test as compared with only 26% who 
were able to solve ver: al absurdities 11. The 
table shows the tests arranged in order of 
difficulty, the percentage passing each test, 
and the serial number of each test from which 
its difficulty value is reliably different. 


TOTAL GROUP 
N=682 
TEST % Reliably 
Pass- diff. from 
Name ing Test No. 
Making change----_- 46 1, 2, 4, 6 
ls wee cto 45 1, 2, 4,6 
Four digits rev... __- 41 1, 2, 8, 4, 5 
Paper cutting I___-- 39 2, 3, 4, 5, 6 
‘iit Re 1, 2, 3, 5, 6 
Verb. absurd. IT__--- 26 1, 3, 4, 5, 6 


Year IX: PERCENTAGE PASSING EACH TEST FOR THE THREE AGE GROUPS 
M. A. Range: 8-6 Through 10-5 





Intelligence 
Chronological Age Quotient PERCENTAGE PASSING TEST 
Group Range Ave. Range Ave. 1 2 3 4 5 6 
Youn 11-5 to 16-7 14.5 57-79 65.6 64 44 76 53 65 64 
Mid 16-9 to 22-9 19.2 57-69 63.9 59 48 66 57 76 56 
Older 54 22-11 to81-10 34.1 57-69 62.0 37 33 48 43 76 68 
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In comparing the influence of life age upon 
test performance at this level (Table 22), we 
find that designs (Test 3) is the only test 
having a truly reliable difference, 28% more 
of the younger than of the older group being 
able to pass the test item. 


Although the difference is not completely 
reliable, the older group did less well on the 
paper cutting I (Test 1) than either the 
middle or younger group, the critical ratios 
being 2.35 and 2.93 respectively. 


12. Year X 


At the X year level the order of difficulty 
for the test items is identical for the Mental 
Age group and the Total group. Table 23 
shows the tests of this level listed in order 
of difficulty. 


Picture absurdities II (Test 2) shows a 
reliable difference in favor of the younger 
group, the percentage passing being 43% 
higher for this group than for the older group 
(Table 24). Vocabulary (Test 1) again 
appears to be easier for the older group than 
for the younger, and, although the critical 
ratio of 2.86 shows that this difference is not 
completely reliable, there are, nevertheless, 
99 chances in 100 that this difference, 23%, 
is a true difference. 
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13. Year XI 


Table 25 shows the data for the various 
tests on the XI year level. For the Total 
group the percentage passing ranges from 
54% for problem situation to 18% for verbal 
absurdities III. 


None of the test items show a clear and 
reliable change with life age. Although 30% 
more of the middle than of the older group 
were able to pass the verbal absurdities III 
(Test 2), with the younger group between 
the two, the differences are not reliable. 


Age seems to act as a slight handicap for 
three similarities (Test 6) also. On this item 
the middle group again received the highest 
rating, 61% of the middle group and 41% 
of the younger group, as compared with only 
28% of the older group, being able to pass 
the test. 


D. SUMMARY AND CONCLUSIONS 


The test performances on the 1937 Revi- 
sion of the Stanford-Binet, Form L, of one 
thousand patients at a state institution for 
mental defectives were analyzed to determine 
the relative difficulty of test items at year 
levels from Year II through Year XI. Only 
patients who had an intelligence quotient of 
less than 80 were included in this study. 


TABLD 23 
YEAR X: RELATIVE DIFFICULTY AND RELIABILITY OF DIFFERENCES 
MENTAL AGE GROUP TOTAL GROUP 
N=73 N =466 

TEST % Reliably TEST % Reliably 
Pass- diff. from Pass- _ diff. from 
No. Name ing Test No. No. Name ing Test No. 

1 Vocabulary___...... 17 2, 3, 4, 6 1 Vocabulary._._._..- 45 2, 3, 4, 6 

5 Word naming---- ._- 67 3,4 5 Word naming-.------ 44 2, 3, 4, 6 

zs aaa 64 1, 3,4  ¢ aa 1, 3, 4,5 

2 Pict. absurd. IT _--_- 63 1, 3, 4 2 Pict. absurd. II__-__- 42 1, 3, 4, 5 

_ °°} 49 1, 2, 5, 6 _ 8c 27 1, 2, 5, 6 

4 a 48 1, 2, 5, 6 4 ReasonsI____._____ 27 1, 2, 5, 6 

TABLE 24 
YEAR X: PERCENTAGE PASSING EACH TEST FOR THE THREE AGE GROUPS 
M. A. Range: 9-6 Through 11-5 
: Intelligence 

Chronological Age Quotient PERCENTAGE PASSING TEST 
Group N Range Ave. Range Ave. 1 2 3 4 5 6 
Younger... 48 12-8 to 17-9 15.3 63-79 70.3 67 83 42 48 69 62 
Middle__.. 47 17-11 to21-11 19.1 63-76 69.8 17 60 62 55 64 55 
Older... _-- 48 22-1 to 46-9 27.8 63-76 69.1 90 40 46 62 62 67 
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TABLE 25 
Year XI: RELATIVE DIFFICULTY AND RELIABILITY OF DIFFERENCES 
aren, A GROUP TOTAL _ 
TEST % Reliably TEST % Reliab 
Pass- diff. from ae 4 
No. Name ing Test No. No. Name ing Test No. 
4 Sent. mem. IV_----- 17 1.36 5 Problem sit.___.__-- 54 1, 2, 3, 4, 6 
38 Abstr. wds.I........ 15 1, 2,6 4 Sent. mem. IV_----- 33 1, 2, 3, 5, 6 
5 Problem sit..__.._-. 65 1,6 8 Abstr. wds. I.___-_- 80 1, 2, 4, 5, 6 
2 Verb. absurd. III.._.. 658 3,4 _ #1; greats 26 2, 8, 4, 5 
eS 3 a scecéce | ae 3, 4, 5 6 Three simil.......... 25 2, 3, 4,5 
a Cs SES 43 8, 4, 5 2 Verb. absurd. III.... 18 1, 3, 4, 5, 6 
TABLE 26 
YEAR XI: PERCENTAGE PASSING EACH TEST FOR THE THREE AGE GROUPS 
M.A. Range: 10-6 Through 11-10 
Intelligence 
Chronological Age Quotient PERCENTAGE PASSING TEST 
Group N Range Ave. Range Ave. 1 2 3 4 5 6 
Younger... 29 14-0 to 19-5 16.8 70-79 73.9 41 52 69 65 62 41 
Middle__.. 28 19-6 to23-8 21.0 70-78 74.1 88 61 78 71 64 61 
GR saan 29 23-9 to 73-5 31.0 70-79 74.1 31 31 72 69 72 28 


For this group the data show considerable 
and reliable differences in difficulty among 
the tests within each year level. If these 
tests are re-arranged according to relative 
difficulty, the new order will show variations 
from the order in which the test items are 
arranged in the standardized examination. 
Certain patterns of difficulty become evident 
upon inspection of the tables: for these 
patients tests like verbal absurdities, sentence 
memories, reasons, and completing the pic- 
ture of a man offer greater difficulties than 
other items such as the vocabulary and com- 
prehension groups of tests. 


The same data were also studied to deter- 
mine the influence of life age upon test per- 
formance. Analysis shows that the younger 
groups give a reliably or nearly reliably bet- 
ter performance than the older groups of 
comparable mental ability on tests involving 
pointing or actual manual performance on 
the part of the examinee; that is, on tests 
such as identifying parts of body, picture 
comparison, commands, bead chains, mazes, 
designs, and paper cutting, as well as on tests 
involving a humorous situation, as, for ex- 
ample, picture absurdities and Wet Fall. 


The older groups, on the other hand, found 
less difficulty than the younger groups of like 


mental ability on tests of vocabulary, defini- 
tions, picture identification, and test items 
involving comprehension. Performance on 
these more verbal tests apparently is affected 
to a greater than ordinary degree by life age 
and by school and general experiences. 


Our data seem to indicate that for subjects 
of low intelligence, certain types of tests tend 
to be consistently more difficult than others, 
and, as a result, definite patterns of relative 
difficulty of the tests within the various year 
levels appear. A complete and qualitative 
evaluation of test performance, therefore, 
must necessarily include a consideration of 
the relative difficulty values of the tests 
within each year group, particularly those 
tests passed or failed at the upper and lower 
limits of the standard testing range. 


From our study we may also conclude that 
increased life age and experience appear to 
have a differential influence upon perform- 
ance on certain test items, and therefore, in 
psychometric work with individuals of low 
intelligence, at least, an understanding of the 
relative effect of chronological age will also 
be valuable whenever a qualitative interpre- 
tation of test data is desired. 
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